spinalcordtoolbox / PAM50

https://github.com/neuropoly/spinalcordtoolbox
2 stars 1 forks source link

Discussion about including the histology atlas in PAM50 upon installation #7

Closed jcohenadad closed 1 year ago

jcohenadad commented 1 year ago

Spin-off of this comment:

(from internal discussion): Might not need to be included in the mainline PAM50 template – could be a separate sct_download_data download.

It will be necessary not only to update the URL but also to add a new option to download.py. Something like PAM50_histology?

I think it would be nice to include the histology atlas upon installation, if it is not "too" large. And i think we should be able to reduce it to 50MB (currently 92MB).

jcohenadad commented 1 year ago

From @valosekj

What's the size of all these files? (related to #7)

After resampling all files to 0.083333, 0.083333, 0.5, the total size is 179M:

valosek@macbook-pro:~/code/PAM50/histology$ du -h      
179M    .

i.e., a dramatic increase from current 92M:

valosek@macbook-pro:~/code/PAM50/histology_backup$ du -h
 92M    .

I tried to change gzip compression ratio from default 6 to 9 ("slowest compression level, which provides the smallest file size"), but the size decrease is negligible:

valosek@macbook-pro:~/code/PAM50/histology$ du -h
177M    .

I also tried bzip2; the compression performance is similar.

We can try to play with the data type, which is currently float32.

jcohenadad commented 1 year ago

I tried to change gzip compression ratio from default 6 to 9 ("slowest compression level, which provides the smallest file size"), but the size decrease is negligible:

I would not play with this.

FLOAT32 should be the one to use, I think.

jcohenadad commented 1 year ago

Alternatively, we change the resampling to 200 µm x 200 µm x 500 µm. I think it should be "good enough" for what our users will do with the data (ie: compare with MRI)

valosekj commented 1 year ago

Alternatively, we change the resampling to 200 µm x 200 µm x 500 µm. I think it should be "good enough" for what our users will do with the data (ie: compare with MRI)

This sounds good!

I used:

valosek@macbook-pro:~/code/PAM50/histology_200x200x500um$ for file in *nii.gz;do;sct_resample -i ${file} -mm 0.2x0.2x0.5 -x linear;done

Then, dim is [3, 108, 83, 205, 1, 1, 1, 1] and pixdim is [-1.0, 0.199074, 0.198795, 0.5, 0.0, 0.0, 0.0, 0.0]:

valosek@macbook-pro:~/code/PAM50/histology_200x200x500um$ for file in *.gz;do echo ${file};sct_image -i ${file} -header | grep dim | head -3;echo "";done
PAM50_200um_AVF.nii.gz
dim     [3, 108, 83, 205, 1, 1, 1, 1]
pixdim      [-1.0, 0.199074, 0.198795, 0.5, 0.0, 0.0, 0.0, 0.0]

PAM50_200um_Eccentricity.nii.gz
dim     [3, 108, 83, 205, 1, 1, 1, 1]
pixdim      [-1.0, 0.199074, 0.198795, 0.5, 0.0, 0.0, 0.0, 0.0]

PAM50_200um_EquivDiameter.nii.gz
dim     [3, 108, 83, 205, 1, 1, 1, 1]
pixdim      [-1.0, 0.199074, 0.198795, 0.5, 0.0, 0.0, 0.0, 0.0]

PAM50_200um_EquivDiameter14.nii.gz
dim     [3, 108, 83, 205, 1, 1, 1, 1]
pixdim      [-1.0, 0.199074, 0.198795, 0.5, 0.0, 0.0, 0.0, 0.0]

PAM50_200um_EquivDiameter48.nii.gz
dim     [3, 108, 83, 205, 1, 1, 1, 1]
pixdim      [-1.0, 0.199074, 0.198795, 0.5, 0.0, 0.0, 0.0, 0.0]

PAM50_200um_MVF.nii.gz
dim     [3, 108, 83, 205, 1, 1, 1, 1]
pixdim      [-1.0, 0.199074, 0.198795, 0.5, 0.0, 0.0, 0.0, 0.0]

PAM50_200um_Naxons.nii.gz
dim     [3, 108, 83, 205, 1, 1, 1, 1]
pixdim      [-1.0, 0.199074, 0.198795, 0.5, 0.0, 0.0, 0.0, 0.0]

The size is now 31M:

valosek@macbook-pro:~/code/PAM50/histology_200x200x500um$ du -h                                           
 31M    .
orig PAM50_200um_Naxons.nii.gz Screenshot 2023-03-24 at 12 34 52 PM
200 µm x 200 µm x 500 µm PAM50_200um_Naxons.nii.gz Screenshot 2023-03-24 at 12 34 46 PM
jcohenadad commented 1 year ago

since you are doing a resampling anyway, how about using a 'round' 0.2 resolution (vs. 0.1991...)

valosekj commented 1 year ago

since you are doing a resampling anyway, how about using a 'round' 0.2 resolution (vs. 0.1991...)

I indeed intended this using sct_resample -i ${file} -mm 0.2x0.2x0.5 -x linear (see the first command in https://github.com/spinalcordtoolbox/PAM50/issues/7#issuecomment-1483096525). But still, the resulting resolution is 0.199074x0.198795x0.5.

jcohenadad commented 1 year ago

I indeed intended this using sct_resample -i ${file} -mm 0.2x0.2x0.5 -x linear (see the first command in https://github.com/spinalcordtoolbox/PAM50/issues/7#issuecomment-1483096525). But still, the resulting resolution is 0.199074x0.198795x0.5.

interesting-- this is something we should raise as an SCT issue-- maybe related to some precision/rounding issues with the library used to do the resampling... we should clarify what the cause of this discrepancy is

but regarding this project, if we want to move forward quickly, i'd say let's go with 0.199...

valosekj commented 1 year ago

I indeed intended this using sct_resample -i ${file} -mm 0.2x0.2x0.5 -x linear (see the first command in #7 (comment)). But still, the resulting resolution is 0.199074x0.198795x0.5.

interesting-- this is something we should raise as an SCT issue-- maybe related to some precision/rounding issues with the library used to do the resampling... we should clarify what the cause of this discrepancy is

but regarding this project, if we want to move forward quickly, i'd say let's go with 0.199...

Documented in https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/4077

I think that sct_resample -mm 0.2 does not work as expected in this particular case due to a specific combination of dim=258 and pixdim= 0.083333:

julien-macbook:~/code/PAM50/histology $ sct_image -i PAM50_200um_Naxons.nii.gz -header | grep dim
dim        [3, 258, 198, 205, 1, 1, 1, 1]
pixdim     [-1.0, 0.083333, 0.083333, 0.5, 0.0, 0.0, 0.0, 0.0]