sina-mansour / UKB-connectomics

This repository will host scripts used to map structural and functional brain connectivity matrices for the UK biobank dataset.
https://www.biorxiv.org/content/10.1101/2023.03.10.532036v1
62 stars 7 forks source link

reporting streamline termination stats #17

Closed AndrewZalesky closed 2 years ago

AndrewZalesky commented 2 years ago

Following on from our discussion about reporting stats on streamline terminations, I find it difficult to determine the cause of termination in some cases. Take a look at the below figure where I have simulated data. No noise is included. Using iFOD2, I have set the tckgen options so that seeds = streamlines = selected. However, streamlines are clearly terminating in the middle of bundle (indicated by blue arrows), far from the bundle periphery. How can we determine the cause of the terminations in this case @Lestropie ? cap

AndrewZalesky commented 2 years ago

oh - just found that the -debug option provides these terminations stats.

Main reason for termination here is exiting the image, which is a single slab of voxels.

Lestropie commented 2 years ago

Yes, the third dimension can catch you out in such phantoms.

For reference, if you want even more information about why streamlines are terminating, uncomment this line and recompile. That gives you a density map per termination criterion.

AndrewZalesky commented 2 years ago

thanks for clarifying @Lestropie - I have padded out the 3rd dimension to avoid image exit terminations.

One other quick question: what are the possible causes of an "invalid seed point" as a streamline rejection reason (as reported by -debug)?

Using -select 0 and -seeds N, I would have thought that N - streamlines_generated = sum_of_streamline_rejection_counts, but it seems that the equality here does not necessarily hold?

Lestropie commented 2 years ago

what are the possible causes of an "invalid seed point" as a streamline rejection reason

"Invalid seed point" = whatever mechanism is used to generate seeds selected a position in 3D space, but the selected tracking algorithm decided upfront that it was unable to even commence propagation from that point. E.g. If you were using ACT, and provided a seed image that was partly in CSF, this number would go through the roof. Same goes for using a tensor-based algorithm and having a seed image that encapsulates a voxel with FA below threshold. There's a huge number of possibilities given the interactions between seed mechanisms and tracking algorithms, hence the vague description.

Using -select 0 and -seeds N, I would have thought that N - streamlines_generated = sum_of_streamline_rejection_counts, but it seems that the equality here does not necessarily hold?

What's the magnitude of the inequality? I was looking at that code when modifying it to implement the -output_stats option for this project (#16), and I think it's possible for there to be a slight inequality due to the multi-threading memory ordering model used in the relevant code. It's also possible that there may be some accumulated discrepancy due to worker threads continuing to contribute to those statistics after the sink thread has been flagged for completion; to mitigate that I'd need to tie the statistics to the class responsible for output track file writing rather than the shared class (which in reality would be the better choice in retrospect). Both of these effects should be small, and should disappear if using -nthreads 0. I don't think there should be a code path where one streamline contributes to multiple statistics.

AndrewZalesky commented 2 years ago

thanks for the clarification Rob. The slack is not huge. For N=1000, the slack in the equality varies between about 50-100 streamlines on my system.

For the purposes of the UKB project, perhaps we can simply report three counts: seeded, streamlines and selected, and not provide specific counts for the specific reasons for seed failure, etc. Does this sound reasonable? @sina-mansour @caioseguin

I think it would also be very useful to provide a set of connectome with -select 0.

sina-mansour commented 2 years ago

I think it would be a useful statistic to provide.

The number of seeds will be equal for all subjects (somewhere between 1M to 20M depending on execution time), and we would report the proportions of streamlines and selections as useful stats along with the provided connectomes.

I've now added the -select 0 option and use -seeds to indicate the number of initial seeds to be used.

here is a sample output generated:

tckgen: [INFO] opening image "/home/sina/Documents/Research/Datasets/UK_biobank/sample/1000094_2_0/dMRI/dMRI/5tt.freesurfer.mif"...
tckgen: [INFO] image "/home/sina/Documents/Research/Datasets/UK_biobank/sample/1000094_2_0/dMRI/dMRI/5tt.freesurfer.mif" opened with dimensions 256x256x256x5, voxel spacing 1x1x1xnan, datatype Float32LE
tckgen: [INFO] opening image "/home/sina/Documents/Research/Datasets/UK_biobank/sample/1000094_2_0/dMRI/dMRI/gmwm_seed.mif"...
tckgen: [INFO] image "/home/sina/Documents/Research/Datasets/UK_biobank/sample/1000094_2_0/dMRI/dMRI/gmwm_seed.mif" opened with dimensions 256x256x256, voxel spacing 1x1x1, datatype Float32LE
tckgen: [INFO] opening image "/home/sina/Documents/Research/Datasets/UK_biobank/sample/1000094_2_0/dMRI/dMRI/wmfod_norm.mif"...
tckgen: [INFO] image "/home/sina/Documents/Research/Datasets/UK_biobank/sample/1000094_2_0/dMRI/dMRI/wmfod_norm.mif" opened with dimensions 104x104x72x45, voxel spacing 2.0192299999999999x2.0192299999999999x2x1, datatype Float32LE
tckgen: [INFO] opening image "/home/sina/Documents/Research/Datasets/UK_biobank/sample/1000094_2_0/dMRI/dMRI/5tt.freesurfer.mif"...
tckgen: [INFO] image "/home/sina/Documents/Research/Datasets/UK_biobank/sample/1000094_2_0/dMRI/dMRI/5tt.freesurfer.mif" opened with dimensions 256x256x256x5, voxel spacing 1x1x1xnan, datatype Float32LE
tckgen: [INFO] step size = 1.00639975 mm
tckgen: [INFO] maximum deviation angle = 45 deg
tckgen: [INFO] minimum radius of curvature = 2.0127994454078202 mm
tckgen: [INFO] iFOD2 internal step size = 0.503199875 mm
tckgen: [INFO] rejection sampling will use 7 directions with a ratio of 3.34099722 (predicted number of samples per step = 15.2031403)
tckgen: [  0%]        1 seeds,        1 streamlines,        0 selected
tckgen: [  0%]      165 seeds,      139 streamlines,       29 selected
tckgen: [  0%]      325 seeds,      263 streamlines,       54 selected
tckgen: [  0%]      696 seeds,      567 streamlines,      107 selected
tckgen: [  1%]     1193 seeds,      949 streamlines,      177 selected
tckgen: [  2%]     2355 seeds,     1872 streamlines,      360 selected
tckgen: [  4%]     4687 seeds,     3765 streamlines,      751 selected
tckgen: [  9%]     9222 seeds,     7482 streamlines,     1472 selected
tckgen: [ 18%]    18399 seeds,    14880 streamlines,     2993 selected
tckgen: [ 36%]    36987 seeds,    30068 streamlines,     6035 selected
tckgen: [ 73%]    73364 seeds,    59567 streamlines,    12032 selected
tckgen: [100%]   100000 seeds,    81286 streamlines,    16452 selected
tckgen: [INFO] mean number of samples per step = 14.364652633666992
tckgen: [INFO] no rejection sampling truncations occurred
tckgen: [INFO] Total number of track terminations: 85147
tckgen: [INFO] Termination reason probabilities:
tckgen: [INFO]   Entered cortical grey matter: 40.9%
tckgen: [INFO]   Calibrator sub-threshold: 51.1%
tckgen: [INFO]   Exited image: 1.52%
tckgen: [INFO]   Entered CSF: 0.681%
tckgen: [INFO]   Bad diffusion signal: 0.0294%
tckgen: [INFO]   Excessive curvature: 0%
tckgen: [INFO]   Max length exceeded: 0.00235%
tckgen: [INFO]   Terminated in subcortex: 3.22%
tckgen: [INFO]   Exiting sub-cortical GM: 2.5%
tckgen: [INFO] Track rejection counts:
tckgen: [INFO]   Invalid seed point: 18714
tckgen: [INFO]   No propagation from seed: 90
tckgen: [INFO]   Shorter than minimum length: 20299
tckgen: [INFO]   Longer than maximum length: 2
tckgen: [INFO]   Poor structural termination: 43887
tckgen: [INFO]   Failed to traverse white matter: 556

Out of 100000 seeds, 81286 yielded streamlines, and only 16452 passed selection criteria. I'm not sure if it's worth recording other stats (such as Termination reasons and Track rejection counts).

Currently, I'm thinking of parsing the output files to save these values, but if https://github.com/MRtrix3/mrtrix3/issues/2413 gets implemented, I can change to using that instead.

Lestropie commented 2 years ago

if MRtrix3/mrtrix3#2413 gets implemented

Already is: see #16, MRtrix3/mrtrix3#2414. It's not yet a part of a tagged version, but the code is there.