Open ns-rse opened 11 months ago
I've made some changes to the code, still needs work but hopefully enough to get the gist.
Ultimately, I think we need to be able to go from Skeleton to Graph and back losslessly. Like sparse array formats, each representation has strengths and weaknesses and by "allowing" either format we enable different algorithms.
@ns-rse looking at the description and implementation (not tests) of 0ad90df, I don't see how it could ever work, because a skeleton image is not what to_numpy_array
is producing: it produces an adjacency matrix — and loses all our metadata (such as the path between junctions and tips) in the process.
The naive way to produce the image is to make an empty image, then iterate over all the edges and set the edge coordinates to 1 in the image, before passing that to Skeleton
. I have vague hopes of finding something faster by going directly from the edges to the csr_matrix for paths, but that sounds complicated — and can be left to a later optimisation.
Yes, I'm realising I have deficiency in my udnerstanding of graph theory NetworkX and how Skan should be interacting with it. The tests are hopefully going to be useful and your code to go from Skeleton
to NetworkX
graph work but I need to rethink how to get back from NetworkX to Skeletons whilst preserving the metadata (and also be mindful that we might prune the graph under NetworkX). Hope to spend some more time on it later this week.
@jni Spent some time on this today and have a method of going back and forth from Numpy <--> NetworkX (thanks to my colleague @MaxGamill-Sheffield who shared some code that I refactored abit).
One of the problems I had cognitively was reconciling what the TopoStats work calls a skeleton with the skan
/ NetworkX
representation. The solution in fbc09886a3c70f5c174f23d2de3c209a95938390 treats every element of a 2-D array as a node and makes going back and forth between the two straight-forward.
Let me know what you think.
The solution in https://github.com/jni/skan/commit/fbc09886a3c70f5c174f23d2de3c209a95938390 treats every element of a 2-D array as a node and makes going back and forth between the two straight-forward.
Ah, sorry, I didn't have time to review this at the time.
I don't think that's going to fly unfortunately. Actually, a bit of lore: all the main implementations of skan are in a file called csr.py
because the very earliest implementations were in networkx. But we found that was suuuuper slow so I went about methodically converting everything to scipy.sparse.csr_matrix
and writing out skan as it is today. But while I was working on that it was siloed in this little csr.py
file, until that file became the whole thing. 😊
So, I think that the 2D pixel graph is too low level to try to do a round trip — we will be swapping one performance issue for another. The correct level of abstraction for the Skeleton, imho, is the junction-junction graph, which is what you get when you make use the node id pairs (src, dst) from the summarize
table as an edge list. Then it's a matter of putting the right metadata on the networkx graph. For example, each row of the summarize
table has a row index, which corresponds to path_coordinates(idx)
on the Skeleton
object — so we can store the coordinates (or maybe just the pixel IDs) as an edge attribute. Then, when you prune, you need to merge two edges, and the path coordinates of the new edge is just np.concatenate([edge1['coords'], edge2['coords'][1:]], axis=0)
, for example. All of this gives you enough information to reconstruct a new Skeleton object, in the worst case by writing out the pixels and recomputing a csr skeleton graph — but maybe faster, I need to think about it.
I've been having a play around with this on ns-rse/networkx and had a go at passing the coordinates as edge attributes to NetworkX and reonconstructing paths between co-ordinates using Bresnham's Line algorithm.
It works in some instances but in writing tests I discovered there is insufficient data retained. For example skeleton1
in src/skan/_testdata.py
has a loop in it. This results in two nodes with the same co-ordinates but there is no information about what the points are connecting them.
from skan import Skeleton, summarize
from skan._testdata import skeleton1
summarize(Skeleton(skeleton1))[['skeleton-id', 'coord-src-0', 'coord-src-1', 'coord-dst-0', 'coord-dst-1']]
skeleton-id coord-src-0 coord-src-1 coord-dst-0 coord-dst-1
0 0 2 1 3 3
1 0 2 1 3 3
2 0 2 1 4 0
3 0 3 3 4 6
...and so we've no idea how those two points differ in terms of the path taken between them. This is going to be true of any non-linear edge and so whilst we could recombine the path co-ordinates of edges after pruning in Networkx as you suggested we would have no idea the path taken between those points.
The only solution I can think of so far that doesn't involve using every point as a node is to pass the co-ordinates of the points that make up an edge as an attribute to NetworkX and concatenate those after pruning. Not tried this yet and have been asked to work on something else in the coming weeks but will think about it and try and find some time to try this out.
@ns-rse sorry about the long wait here, as it has been very busy trying to get napari 0.4.19 out the door (still not done) and also family stuff. But I haven't forgotten this!
Quick thoughts:
reonconstructing paths between co-ordinates using Bresnham's Line algorithm.
I don't think this should be needed because I think we should keep the path coordinates and they should be pixel-perfect — there should be no need to infer any pixels. You need to keep the info from both the summary and the skeleton using path_coordinates(i)
where i is the path ID of each row in the summary. iirc I got those matched up exactly at some point.
The only solution I can think of so far that doesn't involve using every point as a node is to pass the co-ordinates of the points that make up an edge as an attribute to NetworkX and concatenate those after pruning. Not tried this yet and have been asked to work on something else in the coming weeks but will think about it and try and find some time to try this out.
Yes like that. But that still vastly reduces the number of nodes in the graph.
I will try to come back to this in the next couple of weeks but didn't want to leave you hanging any longer, again, sorry about the wait!
Don't worry at all @jni we all have lives. Hope you managed some time away from the computer.
That is however super-useful I hadn't clocked path_coordinates()
method and think that would be the perfect tool to for adding the co-ordinates as an attribute.
I'll have a tinker next week and see how I get on.
Thank you and Happy New Year
@ns-rse would you mind rebasing this on main? I can do it if you are too busy right now though.
@jni I merged main in and solved the conflicts, tests failed and so I tried to sort those out locally. Fixed the failing ones but there were more that failed and I haven't got them all sorted yet though as I don't understand how the discard
option evaluates to boolean
for the itteratively_prune_paths()
function. Will try and take a look over the coming days and work that out.
I'm almost there, but I think my current approach is too gung-ho... 😂
Ah, the hardest problem in computer science — off by one errors. This is one of the intermediate steps:
So when removing some of the branches we are breaking up the inner loop, at which point we're already doomed. 😂 Otherwise it's looking pretty good though. Just need to figure out how to avoid cutting open the middle branches...
btw @ns-rse it's a cool data sample — can you tell me what the xy pixel spacing/resolution is? We can incorporate that into the example.
Hi @jni
Ah, the hardest problem in computer science — off by one errors. This is one of the intermediate steps:
:laughing:
I encountered this with the linear molecules though, it just kept on pruning everything back. I think I solved it by looking for the longest path, but not sure how that transfers to loops/closed structures where a break is then introduced.
btw @ns-rse it's a cool data sample — can you tell me what the xy pixel spacing/resolution is? We can incorporate that into the example.
The pixel to nanometre scaling on these images varies depending on the resolution used in the scanning. I think this image is from [minicircle.spm
]() where the scaling is 0.4940029296875
Thanks for continuing to hack away at this :+1:
Let me know if there is anything I can help with.
0.4940029296875
... nm/pixel? ... That's cool/crazy! 😃
First stab at iterative pruning.