Connectome mapping execution duration

I have started testing the runtime for a more practical number of total streamlines/seeds. I used 10 million seeds (-seeds 10M) which in turn resulted in ~2.2M streamlines remaining after ACT.

Here is a short summary of the computation times:

Preprocessing steps, FOD estimation, MSMT CSD: ~30mins Tractography + sift weight calculations: ~2h30mins Mapping connectivity matrices: ~2h30mins

I personally think the last step is where we could act smarter to save a great deal of time. Let me first explain why this is taking so long. Basically, it takes about 10 seconds to run tck2connectome once (for a single atlas, and a single metric). However, the current code is mapping connectivity on 92 alternative atlases (23 cortical x 4 subcortical, see #20) and connectomes are being mapped for 10 alternative metrics (streamline count, FBC, length, MD, etc., see #10. Hence it takes a total of 10 92 10 seconds = ~ 150 mins.

Now I think there are multiple ways we could save time:

For instance, we can opt to select a subsample of all atlases available. If we only use 2 resolutions of the subcortex atlas (scale 1 and 4) and use only one-half of Schaeffer's atlases (only the ones defined for the 7 functional networks) that will result in about 75% reduction in execution time.
Currently, tck2connectome is being called 920 times, every time with a different atlas and metric. However, there's a great deal of redundancy in reading 2M streamlines every single time. If we could possibly add this functionality to tck2connectome to only read the streamlines once, and map multiple connectomes using different choices of atlases and metrics every time that could be a great addition to MRtrix which could both benefit this project and future users. I could alternatively write some python code that does the same but without tck2connectome.
There's also a third option that I personally am in favor of more than the first two: we can alternatively provide the following: (i) metrics (length, sift weights, mean FA, etc.) sampled for every streamline (ii) a resampled tractogram that only contains the endpoints of every streamline, and (iii) the atlases used. With these sets of files, the user can compute any of the connectivity matrices using tck2connectome in ~10 seconds. We could optionally provide a single connectivity map for a single atlas combination that we deem to be the best. This way, those less familiar with connectivity analyses get a connectivity matrix to use, and the experts can get to pick the metric & atlas to map their desired connectivity matrics.

This last option would reduce the connectivity mapping procedure to a couple of minutes at most. Furthermore, this would also reduce file storage requirements to save everything. (currently, the 920 connectomes mapped for every subject is around 800MB, which could be reduced to ~80 MB for all atlases + ~80 MB for the streamline endpoints + ~20MB for connectomes on a single atlas)

(This alternative would also enable mapping high-resolution connectomes using endpoint information if we wanted to do that in a future project.)

Please let me know what you think is the optimal solution in your opinion.

sina-mansour / UKB-connectomics

Connectome mapping execution duration #21