Open jefferis opened 3 years ago
One idea is to replace the weighted scores between the closest dotprops, similarly to alpha. The weights can be a distance from the soma with a pairwise score:
1 - abs(W_1 - W_2)/maxlen
where maxlen = max(diameter(neuron_1), diameter(neuron_2))
.
Another idea is using topological sort of each neuron and then scoring neurons in that order (hence we don’t use k-NN). This may trigger problems as shown in the figure below, but the question is how common are such cases?
Otherwise, for neurons that are aligned but reversed, their somas will be far apart so they’ll get a small score. This could help with increasing precision.
These matching problems normally consist of linked alignment and scoring problems. Sometimes they are joined together so that the alignment is calculated iteratively to optimise some score. But it is very hard to make that fast (think dynamic programming or some kind of optimiser). My approach for NBLAST was to separate the alignment (by nearest neighbour distance) and scoring phases so that they could be done very fast. If we were to use topological sort for alignment as you propose, then either we go back to a combined alignment and scoring (likely slow) or we make the alignment solely based on topology (but this will cause many errors e.g. because the neurons have slightly different lengths as you diagram – this is very common). In either case it is very hard to handle branches. One approach of this general sort is by Cardona and Saalfeld. However this can only really handle unbranched sequences, ignores absolute 3D position (but we know this is highly informative) and is ~ 2 orders of magnitude slower than NBLAST.
Sort of related to what Greg said - although maybe not immediately applicable to Dominik's approach: it may be worth considering performing all topological and/or alignment problem on highly simplified versions of the neurons.
For example, we could produce a combined score from a normal NBLAST similarity and a second score from a topological sort (or tree edit distance) based on only branch points of the query-target neurons.
Here's an example of a flywire pair that might be disambiguated by tnblast. Extensive arbour overlap but very different soma position should mean poor toplogical concordance. Here's the whole lineage group clustered by Yijie, which should include examples of other neurons (in green) that should be closely related to the orange neuron in the first scene.
Another thought: use difference in Strahler order. That might help disambiguate cases where two neurons densely occupy the same area (e.g. local neurons) but the backbones are in different positions, or where one neuron has just many more branches (and will hence have a higher Strahler order on the backbone).
@schlegelp yeap this has been done already ;) still testing what metric works best for weighting though...
Oh is this the one Alex worked on?
Not sure, I tested that myself a few days ago: https://github.com/dokato/nat.nblast/tree/topo
Did you generate a scoring matrix for the Strahler distance?
Nope, so far in all cases, I'm using default smat.fcwb
. Is it easy to generate a new scoring matrix? If so, I can do that quickly with new metrics.
@dokato a placeholder