Questions on reliability, efficiency, et al

parekhpravesh commented 7 years ago

Hello,

I have a few questions about mriqc which I have outlined below...please let me know if you would want me to open a different thread for any/all of them. Also, its quite likely that some of these points might be better reserved for a subsequent point into the development of mriqc...

Regarding the reliability of some of the measures. I have noticed that some metrics (most importantly the foreground to background ratio) varies between runs. Is this expected (perhaps because of some local minima in the segmentation processes which means that the estimation of background might be slightly off in each run?)..how much variability is expected? I have attached the consolidated sheet of QC measures from five runs on the first five subjects in the ds001 dataset. Clearly sub-04 is off by a large margin (175-802)...turns out this might be because ds001/sub-04 is already skull stripped. However, you can see the variability in other subjects too...
Regarding the computation time taken: when running with n_proc = 30 and ants_thread = 20, we have an approximate run time of 1 hour 10 minutes for 5 subjects (sub-01 to sub-05 from ds001; only anatomical scans). Out of this time, almost 50 minutes is spent on spatial normalization. For example,

170202-00:37:37,355 workflow INFO: [Job finished] jobname: PlotSegmentation.a1 jobid: 68 170202-01:22:44,287 niworkflows INFO: Successful spatial normalization (retry #0).

Is this expected? I am assuming that on including the functional scans (or in cases when multiple scans are present), the computational time would only go up. Any suggestions on keeping it down?

How best to optimize the values between n_proc and ants_thread? Is the maximum specification the best way to go about it or would you have any other suggestions?
Between using mriqc and qap, the measures are different from each other (we found this in an initial test run on a single subject...I am yet to run it on a larger sample). Is this just because of the way metrics are being calculated (using different equations for the calculation, for example) and is expected, or is this a cause of concern?
For structural data, we have noticed slight differences when manually performing acpc correction vs. giving the data as obtained from the scanner. The efficiency of the segmentation/normalization steps are linked to the 0,0,0 voxel and should be as close to the AC as possible (though minor differences should be okay). However, sometimes the roll and yaw might be quite off even when the origin is set close to the AC. As above, how much variability should be expected/tolerated and how can we be sure of the metric that is being calculated in cases where the manual correction has not been performed? (I will try and update with some more data on exactly how off these numbers are before and after manual acpc correction).

Many thanks for your help and your time.

Regards Pravesh

consolidated_run1_run5_server.xlsx

oesteban commented 7 years ago

Regarding the reliability of some of the measures. I have noticed that some metrics (most importantly the foreground to background ratio) varies between runs. Is this expected (perhaps because of some local minima in the segmentation processes which means that the estimation of background might be slightly off in each run?)..how much variability is expected? I have attached the consolidated sheet of QC measures from five runs on the first five subjects in the ds001 dataset. Clearly sub-04 is off by a large margin (175-802)...turns out this might be because ds001/sub-04 is already skull stripped. However, you can see the variability in other subjects too...

Thanks for reporting this. No, this is not expected at all. I have been checking on those metrics and all of them have one thing in common: all of them rely on the air mask.

For example, take these two https://623-54756129-gh.circle-artifacts.com/0/home/ubuntu/scratch/out/reports/sub-ds205s03_T1w.html and https://620-54756129-gh.circle-artifacts.com/0/home/ubuntu/scratch/out/reports/sub-ds205s03_T1w.html. The air masks are pretty similar but they are not exactly the same.

I have opened #367 to investigate this.

Almost since this project was born, @chrisfilo has been suggesting regression tests on the measures, I think I cannot delay those tests anymore after MRIQC 1.0 (thus, #368)

Regarding the computation time taken: when running with n_proc = 30 and ants_thread = 20, we have an approximate run time of 1 hour 10 minutes for 5 subjects (sub-01 to sub-05 from ds001; only anatomical scans).

I wouldn't expect n_proc to speed up by a lot with 5 subjects above n_proc=5 (BTW, the option in MRIQC is --n_procs, just in case). On the other hand, if you have 100 subjects and go with 30 mriqc parallel threads (n_procs=30) you are likely to hit the memory maximum of the node/computer. In our experience, FSL FAST may take up to 3GB per subject, and ANTs is also rather expensive. Therefore, for 16GB of RAM I wouldn't recommmend n_procs > 5. For 24GB RAM use n_procs <= 8. Also note that n_procs >> the number of cores (virtual and real) will likely degrade the performance.

Regarding --ants-nthreads, the more the better. However, for values > 6 you will probably see little improvements (https://github.com/stnava/ANTs/issues/268).

Out of this time, almost 50 minutes is spent on spatial normalization. For example,
170202-00:37:37,355 workflow INFO:
[Job finished] jobname: PlotSegmentation.a1 jobid: 68
170202-01:22:44,287 niworkflows INFO:
Successful spatial normalization (retry #0).
Is this expected? I am assuming that on including the functional scans (or in cases when multiple scans are present), the computational time would only go up. Any suggestions on keeping it down? There's no way of workaround this. If you just want to check that MRIQC just runs (but results will be rather inaccurate), you can use the --testing flag. That will reduce the spatial normalization process to only one affine transform. Therefore it will be a lot faster.

How best to optimize the values between n_proc and ants_thread? Is the maximum specification the best way to go about it or would you have any other suggestions?

As said before, you'll need to find a trade-off between memory available and the number of processors on your settings.

Between using mriqc and qap, the measures are different from each other (we found this in an initial test run on a single subject...I am yet to run it on a larger sample). Is this just because of the way metrics are being calculated (using different equations for the calculation, for example) and is expected, or is this a cause of concern?

MRIQC was forked from QAP one year ago, and we have revised the implementation of mostly all measures, and added some others. I think this is kind of expected. Unfortunately (we are exactly working on this), the image quality measures is a field rather unexplored and we are here settling the ground for it :)

For structural data, we have noticed slight differences when manually performing acpc correction vs. giving the data as obtained from the scanner. The efficiency of the segmentation/normalization steps are linked to the 0,0,0 voxel and should be as close to the AC as possible (though minor differences should be okay). However, sometimes the roll and yaw might be quite off even when the origin is set close to the AC. As above, how much variability should be expected/tolerated and how can we be sure of the metric that is being calculated in cases where the manual correction has not been performed? (I will try and update with some more data on exactly how off these numbers are before and after manual acpc correction).

Sorry, I don't quite get this. Did you experience better performance on the spatial normalization when manually correcting the ACPC alignment? Other than that, I would expect a degraded performance with the ACPC alignment since you are introducing two factors: one resampling (and an unavoidable smoothing) and also you introduce a "black" framework around the volume in the rotation, since the transformation falls outside the FoV in areas typically close to the furthest points from the center (i.e. the corners of the image).

parekhpravesh commented 7 years ago

Hello,

Thank you for taking time to reply. I can imagine that you guys are really busy trying to sort out the various things and improving the overall implementation!

Thanks for the suggestion...we are not really limited by RAM requirements for the moment (512 GB :grin: ) but I will use your suggestion when running tests and see if I can find some optimum value...perhaps users could contribute run times and settings in a single thread so that it becomes easier to optimize?

Yes, I agree that in image processing, we have been far more following up on writing papers than actually working out robust methods. I am glad things are changing...!

Regarding the ACPC alignment step, what we are doing is changing the header (using SPM) such that the 0,0,0 voxel corresponds to the AC. However, we are not resampling/smoothing the data in any manner. Similar to the procedure described here (http://sabre.brainlab.ca/docs/processing/stage3.html), we are correcting pitch, roll, and yaw so that the longitudinal fissure is parallel to the crosshair, the eyeballs are of equal size, and that the 0,0,0 voxel corresponds exactly to the AC. As far as I know, having a good estimate of the starting point should lead to better segmentation/normalization. Do you think that would impact the way QC measures get calculated? Perhaps I can run some simulations tomorrow morning and post some results?

oesteban commented 7 years ago

I see, you are just setting the sform matrix, not resampling. My first guess is that ANTs is not using the sform and thus the change is completely transparent to MRIQC. If you checked on that, I would really appreciate it :)

Thanks very much

parekhpravesh commented 7 years ago

Hello @oesteban ,

Again, apologies for the delay in updating this. I ran the following three tests:

using mriqc as usual (on the anatomical scans)
manually set the origin to the AC using SPM12
used the acpcdetect module of ART (https://www.nitrc.org/projects/art/) to detect the AC and modified the sform matrix

As you had rightly predicted, I don't see any differences between the usual run and SPM modified run (apart from the variations in measures derived from airmask). [Please note that these runs were before the fix]. Strangely though, the orientation of the images in the report in case of SPM modified images are quite bad (see attachment).

Further, the estimates after having changed the sform matrix using acpcdetect are different from the usual run.

I am sharing a zip file through Google Drive which has the following:

Results (for a single subject) from running mriqc as usual (sub-01_T1w_usual)
Results (for a single subject) from running mriqc after SPM alignment (sub-01_T1w_spm)
Results (for a single subject) from running mriqc after acpcdetect (sub-01_T1w_acpcdetect)
anatMRIQC_usual.csv, anatMRIQC_spm.csv, and anatMRIQC_acpcdetect.csv files

Please do share your thoughts on these!

chrisgorgo commented 7 years ago

Interesting - seems that the volumes created by SPM were resliced. Could you confirm this or share the image data?

parekhpravesh commented 7 years ago

Hello! The volumes created by SPM and were not resliced. I have uploaded the files (Google Drive). We used the "Display" function in SPM and adjusted the pitch/roll/yaw till the AC was at its brightest, eye size equal, and the longitudinal fissure parallel to the crosshair. Then we reoriented the images such that the origin was at the AC. When running acpcdetect, we explicitly passed the "--sform" parameter to the script.

oesteban commented 7 years ago

Hi @parekhpravesh,

We have been addressing several reproducibility issues on MRIQC. Particularly, we have made a great effort to promote containers, and now we have a tight versioning system with Docker hub for those containers. We have added tests to check when metrics change and when intermediate files change, and discovered some sources of variability of the results.

The only open issue here is the orientation of images. For this we would recommend using MRIQC on images that haven't gone through any kind of processing. Maybe defacing since it is generally a requirement prior to moving data around. If you want to keep discussing about this issue, please feel free to open a new issue about that.

For now, I will close this reproducibility issue since I think it has been completely addressed.

nipreps / mriqc

Questions on reliability, efficiency, et al #366