sina-mansour / UKB-connectomics

This repository will host scripts used to map structural and functional brain connectivity matrices for the UK biobank dataset.
https://www.biorxiv.org/content/10.1101/2023.03.10.532036v1
62 stars 7 forks source link

Multiple tractograms #18

Closed Lestropie closed 2 years ago

Lestropie commented 2 years ago

From @AndrewZalesky:

I like that we will provide connectomes for SIFT2 and raw streamlines counts. I do think that we may need to tailor the seeding strategy for the these two pipelines. For SIFT2, we will need ACT and perhaps use the dynamic seeding strategy - perhaps Rob can guide us on the best seeding options for SIFT2? For the raw option, I would like to use "-select 0" and turn off ACT - would that be ok?

A few different things to pick apart:

  1. As soon as tractograms are generated with different configurations for different target connectivity metrics, the total streamline count per tractogram that is achievable within a fixed computational resource gets divided accordingly. So the more such configurations that exist, the less dense each actual tractogram will be, and the more at risk of being of reduced robustness each is. That's not to say that we must avoid doing so, it's just worth stating up front. This also applies to e.g. wanting to use both deterministic & probabilistic algorithms.

  2. For SIFT2, dynamic seeding is the preference, but not compulsory. Conversely, for raw streamline count I would advise not using dynamic seeding. So in contrasting this against point 1, the debate is perhaps between:

    • Optimal dynamic seeding for FBC, some other seeding for raw streamline count;
    • One seeding strategy for both FBC and raw streamline count; not optimal for FBC, but twice as many streamlines for both cases.

    If the decision were purely between these two, I would probably (and I think I may have expressed so elsewhere) go with the latter choice; it also has Occam's Razor on its side.

  3. An alternative option to consider is to generate two tractograms: one with homogeneous WM seeding, one with GM-WM interface seeding. For raw streamline count, the metric could be reported for both seeding strategies. Then, for SIFT2 FBC, and potentially for other metrics also, the two tractograms could be concatenated in order to maximise sampling density. In the case of SIFT2, by combining these two seeding strategies the biases manifested by the two seeding approaches counteract one another to some extent.

  4. Even if data are not intended for use with SIFT2, it is unclear what benefit is obtained by not using ACT. The process of assigning streamlines to GM parcels becomes highly ill-posed in the absence of such a constraint, especially in the case of FreeSurfer-derived parcellations.

  5. Using -select 0 and instead constraining the number of streamline seeds is not incompatible with SIFT2. So even if multiple tractograms were to be generated, the tckgen termination criterion does not need to differ between executions.

AndrewZalesky commented 2 years ago

thanks Rob. My only concern with using ACT for the 'raw option' is that prematurely terminating streamlines will be replaced with streamlines in bundles that are easier to propagate through. SIFT will fix this over/under-sampling of bundles, but for users wanting to analyze raw streamline counts or investigate alternative streamline filters, having ACT processed data will change the inherent sampling distribution of streamlines and reduce the variation in total streamline counts between individuals. Not sure why you say it is ill-posed - we can just use the same old rules for assigning streamlines to nodes. Prematurely terminating streamlines make no contribution to the connectivity matrix.

sina-mansour commented 2 years ago

Also a few comments on the same issue:

I think the problem could be divided into the following (potentially separate) discussions:

  1. Tractography procedure appropriate for different connectomes: Ideally, mapping streamline count and FBC would need different tractography procedures (e.g. dynamic seeding), but I do agree that in the interest of computation time limitations and dense tractogram generation, we may benefit from finding a middle ground and using a single tractography procedure to map all measures. For instance, we could leave out the dynamic seeding approach, and use -select 0 to limit only the number of seeds. This would not be optimal for FBC, but potentially having twice as many streamlines could largely improve the sensitivity and accuracy of connectivity estimates.

  2. Seeding mask: With regards to the two alternative seeding mask options (1. seeding from the GM-WM boundary 2. seeding from a WM mask), It's an interesting Idea to combine/add the streamlines from each approach together. However, I'm not sure if it will resolve the biases of either approach or potentially combine the biases of each approach. For instance, the boundary seeding may create more short-range connections and reduce the chance of detecting interhemispheric connections, whereas the WM seeding approach is biased to creating more long-range connections. Combining the two (in the case of streamline count mapping) is merely adding two different connectivity matrices together. The fact that the biases in each approach can counteract the biases in the other approach suggests that one could potentially reduce such biases by this combination. On the other hand, one may argue that this could further entangle the existing biases to a more complex bias. More importantly, our aim is to map connectivity matrices using best practices that are commonly used. Although this idea of combined seeding is promising, I'm not sure if its impacts are evaluated, or whether is normally used by researchers. My understanding is that people may fall into two groups, one that prefers to seed from the boundary and one that prefers to use a WM seed. So it's also important to assess whether using a combination is creating a middle ground, or providing an alternative approach that is not commonly used.

  3. Performing ACT: I think both streamline count and FBC connectomes will benefit from ACT, as long as the -select 0 option is used. If -select 0 was not used, ACT might have discarded streamlines and as a result, used more seeds. This can be resolved by fixing the number of starting seeds. Additionally ACT would arguably improve connectome mapping accuracy by trimming tracts that have reached two GM regions and filtering connections that have ended in WM. I personally think adding the option of potential relaying streamlines through subcortex would improve ACT (discussed in #8), but I think using the current implementation is better than not performing ACT at all.

  4. Decision branches: Each of the previous issues could be viewed as a potential branching point in the decision-making. We could either implement one decision or provide alternatives. Providing multiple alternatives gives the future user a degree of freedom and choice; however, it may also (i) reduce the quality of final connectomes mapped for each alternative as a result of dividing resources between approaches, as well as (ii) potentially confusing the future users in terms of which connectome should I use (e.g. with ACT/without ACT, dynamically seeded or not, seeded from boundary or WM). I think it's hence important to minimize the alternative branches for which connectomes are mapped. Given that we already have other branching points for mapped connectomes (alternative atlases and alternative metrics), I think it would be ideal to make a decision and provide a single map for every combination of atlas and metrics and not further complicate the data provided by another branching decision.

sina-mansour commented 2 years ago

thanks Rob. My only concern with using ACT for the 'raw option' is that prematurely terminating streamlines will be replaced with streamlines in bundles that are easier to propagate through. SIFT will fix this over/under-sampling of bundles, but for users wanting to analyze raw streamline counts or investigate alternative streamline filters, having ACT processed data will change the inherent sampling distribution of streamlines and reduce the variation in total streamline counts between individuals. Not sure why you say it is ill-posed - we can just use the same old rules for assigning streamlines to nodes. Prematurely terminating streamlines make no contribution to the connectivity matrix.

I may have misunderstood Andrew, but wouldn't using -select 0 avoid creating that change in sampling distribution? I thought that if we use -select 0 then ACT would discard prematurely terminating streamlines, but that would not necessitate generation of more streamlines as the number of seeds are fixed rather than number of final streamlines.

AndrewZalesky commented 2 years ago

that's a good point Sina, but I am not sure about the intricacies here. For example, if a streamlines terminates prematurely and then we backtrack in hope of recovering the streamlines, does the backtracking attempt decrease the seed budget by one? If backtracking does not decrease the seed budget, then ACT could still lead to an altered sampling distribution. Actually, even if the budget is decreased by one, won't this still alter the sampling distribution because we are effectively giving the same seed point multiple chances to form a connection?

sina-mansour commented 2 years ago

I think backtracking is not the default option for ACT, there's a separate flag to use combined with -act in case we also want to do backtracking (-backtrack). But the current codes don't use this flag. So I guess, as long as we do not explicitly ask for backtracking, it's disabled by default.

@Lestropie is this assumption correct?

Lestropie commented 2 years ago

My only concern with using ACT for the 'raw option' is that prematurely terminating streamlines will be replaced with streamlines in bundles that are easier to propagate through.

One could say that, if generating a fixed number of output streamlines, and one bundle results in streamlines more frequently terminating prematurely and being rejected, then all other bundles will get a slight boost in streamline count as a result. If however you are generating a fixed number of seeds, as seems to be the current decision, then this does not occur.

Not sure why you say it is ill-posed - we can just use the same old rules for assigning streamlines to nodes.

https://www.sciencedirect.com/science/article/pii/S105381191930388X

I'd hoped to investigate other things as part of that work, but it's nevertheless a reasonable demonstration of the consequences generally. One big differential is the nature of the parcellation: with something like AAL you have huge parcels that dilate deep into the WM, so streamlines terminating prematurely in the WM may still be assigned to such a parcel, whereas with FreeSurfer the streamline really needs to reach the cortex; and the error margins associated with this depend on the algorithm that you use to assign streamlines to nodes. For instance in some circumstances people dilate the labels iteratively until all voxels are labeled, at which point no streamline endpoint is ever not assigned to a parcel.

In the linked article we also show the issue of individual streamlines crossing more than two parcels. Only an issue in the data if you choose to assign each streamline to all intersected parcels, but it's nevertheless worth considering in the presence of such data in the reconstruction what should occur, or whether such an observation is indicative of a problem further upstream.

I would also seek clarification on the intention for a scenario where ACT would not be used. It would be reasonable to equate "not using ACT" with "not using any anatomical information"; whereas it might be the case that you're looking at using GM include regions and CSF exclude regions, which achieves a decent fraction (but not all) of the benefits of ACT.

For instance, we could leave out the dynamic seeding approach, and use -select 0 to limit only the number of seeds. This would not be optimal for FBC, but potentially having twice as many streamlines could largely improve the sensitivity and accuracy of connectivity estimates.

Yep, I'd be content with that. Dynamic seeding essentially narrows the streamline weight distribution by reducing the magnitude of the biases to correct, but it's not essential.

Combining the two (in the case of streamline count mapping) is merely adding two different connectivity matrices together.

Correct. But I wasn't proposing such for raw streamline count quantification. There you would most definitely provide individual matrices; that was the point of doing both. But for in particular FBC, you could concatenate the two tractograms and feed them to SIFT2, and it would somewhat mitigate the biases of each individually, and so you'd get an FBC-weighted matrix with limited mitigation of low streamline counts in any bundle that is difficult to reconstruct using one of the two methods. I don't recall how well this actually works, but it's nevertheless on the table. When it comes to metrics like mean FA, using the concatenation of the two tractograms may just give slightly less bias in which areas of the bundle are sampled more or less than others. As you say, if it's slightly exotic, it may be less appealing to those looking to use the data; but in that case you could still just compute each matrix twice, once for each seeding strategy.

Decision branches:

Agree generally with the sentiment here. There's unfortunately not a clear best choice. Concern is that the streamline seeding question could be a sticking point for some potential users. Indeed one of the consequences of SIFT(2) is that the connectivity results become (ideally) invariant to seeding mechanism selection; and for metrics like mean FA, as long as either approach gets a reasonable sampling of the tract volume the results shouldn't change too drastically. It's specifically the raw streamline count and the way in which it changes between seeding mechanisms that is the sticking point IMO. I don't feel particularly strongly between choosing one or doing both, so am content either way; but I would suggest in the latter case concatenating the data prior to SIFT2.

For example, if a streamlines terminates prematurely and then we backtrack in hope of recovering the streamlines, does the backtracking attempt decrease the seed budget by one?

No; it's the same streamline that was generated from the same seed, it merely gets truncated and new probabilistic samples are drawn.

Actually, even if the budget is decreased by one, won't this still alter the sampling distribution because we are effectively giving the same seed point multiple chances to form a connection?

These kinds of questions arise from differences in interpretation of probabilistic streamlines tractography experiments. I opined about it in this chapter, but it's not accessible through UoM.

Basically, there are two different types of probabilistic streamlines tractography experiments---and corresponding interpretations---which have been running alongside one another for years but nobody to my knowledge has previously disambiguated them:

  1. Generate many streamlines from a single seed point; the density of streamlines at any location in the image is a probability of connectivity between the seed point and that location.

    The issue here IMO is that the interpretation is constrained to the case where all streamlines are seeded at precisely the same location. As soon as you have seeds distributed in space, e.g. within a region of interest (ignoring the whole-brain seeding for now), and simply interpret the concatenation of those data, you don't actually have a single connectivity distribution: what you have is an averaging of many grossly under-sampled distributions. You can no longer interpret the streamline count in any location as a singular "probability of connectivity" to the seed region, because it could be highly probably connected to a subset of that ROI and not connected to the rest of it.

  2. Generate streamlines from any seed locations, sampling from some underlying distribution; each streamline in isolation is a plausible fibre trajectory given the image data.

    Because plausibility of trajectories is only established for each streamline in isolation, we are not quite so concerned about any biases in the way in which we sample in order to produce those streamlines (as long as it's not excessive), because our interpretation is not dependent on either the strict mitigation or the direct characterisation and correction of those biases.

    This interpretation however changes once you incorporate SIFT(2) or similar into the mix. Imagine establishing correspondence between whole-brain tractogram streamlines density and underlying fibre density, and then selecting just the subset of streamlines emanating from the same exemplar region of interest from point 1. We can then look at the streamline count in any location in the brain and interpret it as the "density of connectivity" between that location and the seed ROI. If all works well, this density estimate is invariant to the seeding strategy used to generate the whole-brain tractogram, and is also invariant to the reconstruction density of such.

Your question hints at an interpretation of streamline counts based on point 1. If you want to preserve the possibility of such, despite my criticisms of such when applied to anything that is not a singular seed location, then you would most likely want to not use back-tracking, for the reasons you state (regardless of the clarification regarding "seeding budget").

is this assumption correct?

Correct: back-tracking is only used if you ask for it.

AndrewZalesky commented 2 years ago

So it seems we can eliminate another fork and use ACT and -select 0 for all output streams.

Thanks for detailed explanation and clarification Rob. Much appreciated. I have some quite different opinions on the two interpretations but I understand your logic here.