Add support for iterative pruning based on user-specified discard conditions

ns-rse commented 11 months ago

First stab at iterative pruning.

jni commented 11 months ago

I've made some changes to the code, still needs work but hopefully enough to get the gist.

Ultimately, I think we need to be able to go from Skeleton to Graph and back losslessly. Like sparse array formats, each representation has strengths and weaknesses and by "allowing" either format we enable different algorithms.

jni commented 11 months ago

@ns-rse looking at the description and implementation (not tests) of 0ad90df, I don't see how it could ever work, because a skeleton image is not what to_numpy_array is producing: it produces an adjacency matrix — and loses all our metadata (such as the path between junctions and tips) in the process.

The naive way to produce the image is to make an empty image, then iterate over all the edges and set the edge coordinates to 1 in the image, before passing that to Skeleton. I have vague hopes of finding something faster by going directly from the edges to the csr_matrix for paths, but that sounds complicated — and can be left to a later optimisation.

ns-rse commented 10 months ago

Yes, I'm realising I have deficiency in my udnerstanding of graph theory NetworkX and how Skan should be interacting with it. The tests are hopefully going to be useful and your code to go from Skeleton to NetworkX graph work but I need to rethink how to get back from NetworkX to Skeletons whilst preserving the metadata (and also be mindful that we might prune the graph under NetworkX). Hope to spend some more time on it later this week.

ns-rse commented 10 months ago

@jni Spent some time on this today and have a method of going back and forth from Numpy <--> NetworkX (thanks to my colleague @MaxGamill-Sheffield who shared some code that I refactored abit).

One of the problems I had cognitively was reconciling what the TopoStats work calls a skeleton with the skan / NetworkX representation. The solution in fbc09886a3c70f5c174f23d2de3c209a95938390 treats every element of a 2-D array as a node and makes going back and forth between the two straight-forward.

Let me know what you think.

jni commented 10 months ago

The solution in https://github.com/jni/skan/commit/fbc09886a3c70f5c174f23d2de3c209a95938390 treats every element of a 2-D array as a node and makes going back and forth between the two straight-forward.

Ah, sorry, I didn't have time to review this at the time.

I don't think that's going to fly unfortunately. Actually, a bit of lore: all the main implementations of skan are in a file called csr.py because the very earliest implementations were in networkx. But we found that was suuuuper slow so I went about methodically converting everything to scipy.sparse.csr_matrix and writing out skan as it is today. But while I was working on that it was siloed in this little csr.py file, until that file became the whole thing. 😊

So, I think that the 2D pixel graph is too low level to try to do a round trip — we will be swapping one performance issue for another. The correct level of abstraction for the Skeleton, imho, is the junction-junction graph, which is what you get when you make use the node id pairs (src, dst) from the summarize table as an edge list. Then it's a matter of putting the right metadata on the networkx graph. For example, each row of the summarize table has a row index, which corresponds to path_coordinates(idx) on the Skeleton object — so we can store the coordinates (or maybe just the pixel IDs) as an edge attribute. Then, when you prune, you need to merge two edges, and the path coordinates of the new edge is just np.concatenate([edge1['coords'], edge2['coords'][1:]], axis=0), for example. All of this gives you enough information to reconstruct a new Skeleton object, in the worst case by writing out the pixels and recomputing a csr skeleton graph — but maybe faster, I need to think about it.

ns-rse commented 9 months ago

I've been having a play around with this on ns-rse/networkx and had a go at passing the coordinates as edge attributes to NetworkX and reonconstructing paths between co-ordinates using Bresnham's Line algorithm.

It works in some instances but in writing tests I discovered there is insufficient data retained. For example skeleton1 in src/skan/_testdata.py has a loop in it. This results in two nodes with the same co-ordinates but there is no information about what the points are connecting them.

from skan import Skeleton, summarize
from skan._testdata import skeleton1

summarize(Skeleton(skeleton1))[['skeleton-id', 'coord-src-0', 'coord-src-1', 'coord-dst-0', 'coord-dst-1']]

   skeleton-id  coord-src-0  coord-src-1  coord-dst-0  coord-dst-1
0            0            2            1            3            3
1            0            2            1            3            3
2            0            2            1            4            0
3            0            3            3            4            6

...and so we've no idea how those two points differ in terms of the path taken between them. This is going to be true of any non-linear edge and so whilst we could recombine the path co-ordinates of edges after pruning in Networkx as you suggested we would have no idea the path taken between those points.

The only solution I can think of so far that doesn't involve using every point as a node is to pass the co-ordinates of the points that make up an edge as an attribute to NetworkX and concatenate those after pruning. Not tried this yet and have been asked to work on something else in the coming weeks but will think about it and try and find some time to try this out.

jni commented 9 months ago

@ns-rse sorry about the long wait here, as it has been very busy trying to get napari 0.4.19 out the door (still not done) and also family stuff. But I haven't forgotten this!

Quick thoughts:

reonconstructing paths between co-ordinates using Bresnham's Line algorithm.

I don't think this should be needed because I think we should keep the path coordinates and they should be pixel-perfect — there should be no need to infer any pixels. You need to keep the info from both the summary and the skeleton using path_coordinates(i) where i is the path ID of each row in the summary. iirc I got those matched up exactly at some point.

The only solution I can think of so far that doesn't involve using every point as a node is to pass the co-ordinates of the points that make up an edge as an attribute to NetworkX and concatenate those after pruning. Not tried this yet and have been asked to work on something else in the coming weeks but will think about it and try and find some time to try this out.

Yes like that. But that still vastly reduces the number of nodes in the graph.

jni commented 9 months ago

I will try to come back to this in the next couple of weeks but didn't want to leave you hanging any longer, again, sorry about the wait!

ns-rse commented 9 months ago

Don't worry at all @jni we all have lives. Hope you managed some time away from the computer.

That is however super-useful I hadn't clocked path_coordinates() method and think that would be the perfect tool to for adding the co-ordinates as an attribute.

I'll have a tinker next week and see how I get on.

Thank you and Happy New Year

jni commented 5 months ago

@ns-rse would you mind rebasing this on main? I can do it if you are too busy right now though.

ns-rse commented 5 months ago

@jni I merged main in and solved the conflicts, tests failed and so I tried to sort those out locally. Fixed the failing ones but there were more that failed and I haven't got them all sorted yet though as I don't understand how the discard option evaluates to boolean for the itteratively_prune_paths() function. Will try and take a look over the coming days and work that out.

jni commented 3 months ago

I'm almost there, but I think my current approach is too gung-ho... 😂

jni commented 3 months ago

Ah, the hardest problem in computer science — off by one errors. This is one of the intermediate steps:

So when removing some of the branches we are breaking up the inner loop, at which point we're already doomed. 😂 Otherwise it's looking pretty good though. Just need to figure out how to avoid cutting open the middle branches...

jni commented 3 months ago

btw @ns-rse it's a cool data sample — can you tell me what the xy pixel spacing/resolution is? We can incorporate that into the example.

ns-rse commented 3 months ago

Hi @jni

Ah, the hardest problem in computer science — off by one errors. This is one of the intermediate steps:

:laughing:

I encountered this with the linear molecules though, it just kept on pruning everything back. I think I solved it by looking for the longest path, but not sure how that transfers to loops/closed structures where a break is then introduced.

btw @ns-rse it's a cool data sample — can you tell me what the xy pixel spacing/resolution is? We can incorporate that into the example.

The pixel to nanometre scaling on these images varies depending on the resolution used in the scanning. I think this image is from [minicircle.spm]() where the scaling is 0.4940029296875

Thanks for continuing to hack away at this :+1:

Let me know if there is anything I can help with.

jni commented 3 months ago

0.4940029296875

... nm/pixel? ... That's cool/crazy! 😃

jni / skan

Add support for iterative pruning based on user-specified discard conditions #214