neurostuff / NiMARE

Coordinate- and image-based meta-analysis in Python
https://nimare.readthedocs.io
MIT License
178 stars 57 forks source link

Align file naming convention more with fitlins/BIDS #624

Open tsalo opened 2 years ago

tsalo commented 2 years ago

Summary

Our current naming conventions for (1) downloaded NeuroVault files, (2) maps generated with ImageTransformer, (3) meta-analysis outputs, and (4) maps generated with KernelTransformers are all pretty different, and none of them closely match BIDS convention.

I have a prospective BEP that would formalize the naming conventions from fitlins (https://docs.google.com/document/d/1OxKhhEctlQg6nmw1L3vw9Mm0K9-KkTpVKZuNqYnlYEM/edit?usp=sharing), and I'd love to adopt the conventions proposed there in NiMARE.

This is tangentially related to a point brought up in #623, but I primarily want to do it for different reasons. I also think it's relevant for Neurostore, since Neurostore (or maybe just NeuroVault?) will be ingesting files from NiMARE.

Files downloaded from NeuroVault

Currently, these files look like this:

collection-<collection_id>_id-<image_id>_<contrast_name>_<original_file_name>.nii.gz

I don't think collection and image IDs will necessarily match study and experiment IDs, but they are unique identifiers, so I'm split on whether to keep using them or not... Any thoughts?

My first pass is the following, in which the contrast name would be coerced to camelCase, the statistic type would be added to the filename, and the original filename would be discarded. However, I don't know if there's potential information in the original filenames that would be worth incorporating into the new filenames.

collection-<label>_image-<label>_contrast-<label>_stat-<label>_statmap.nii.gz

Files generated with ImageTransformer

Currently, our files look like this:

study-<study_id>-<experiment_id>_<resolution>_<stat>.nii.gz

I think we could replace the resolution string (e.g., 2.0x2.0x2.0) with the res entity, but I am concerned that res requires a sidecar file to link a unique label with the actual resolution (e.g., res-2 does not imply that the resolution is 2mm). I don't think we want to start managing sidecar files, but I could be wrong.

So I guess this might look something like this:

res-<label>_statmap.json  # Sidecar file defining the resolution associated with the res label
study-<label>_experiment-<label>_res-<label>_stat-<label>_statmap.nii.gz

Files generated with meta-analytic Estimators and Correctors

Currently, our files look like this:

<prefix>[_desc-<label>][_level-<cluster|voxel>][_corr-<FWE|FDR>][_method-<label>].nii.gz

The solution for these is pretty straightforward, I think.

<prefix>[_desc-<label>][_level-<cluster|voxel>][_corr-<FWE|FDR>][_method-<label>]_stat-<value>_statmap.nii.gz

We don't have any real identifiers, like study ID or collection ID, so we just need to rely on user-provided prefixes.

Files generated with KernelTransformers when return_type="dataset"

Currently, these files look like this:

study-<study_id>-<experiment_id>[_<param1-value1>][_<param2-value2>][..._<paramN-valueN>]_<class_name>.nii.gz

Part of the problem here is that parameters in the KernelTransformer (e.g., the kernel radius) can affect the maps that are generated, so it makes sense to include the parameters in the filenames. On the other hand, including things like a hash of the affine in the filename makes it difficult to interpret.

At minimum, we should probably split the study ID and the experiment ID in the filename, as well as give the files proper suffixes. I'm not sure what else should be done here.

tsalo commented 2 years ago

One thing I don't love about the current convention is that desc is overloaded. For example, in the MKDAChi2 meta-analysis, we have desc-consistency and desc-specificity, but after FWE correction we have different metrics (size, mass, stat, and tfce) that are currently just included in the desc field (e.g., desc-consistencySize).