Closed ns-rse closed 1 year ago
I've been investigating this and think I've worked out why this problem occurs.
I think its because after running pruned.pune_paths(branch_data.loc[branch_data["branch-type"] == remove_brnach_type].index)
the first time it removes elements from the Skeleton.path
and so the resulting length is shortened and when we then try to remove those of type 3
we are trying to use the original indices which no longer exist (that is to say the indexing error is not occurring with branch_data
but with Skeleton.path
).
Thus after each round of pruning we need to renew/reassess the skeleton and then update the branch_data
so we have the correct indices.
I broke the method out into a function to investigate this and this works with the sample image attached above...
def iteratively_prune_paths(
skeleton: Skeleton, remove_branch_type: int = 1, spacing: float = 1e-9, value_is_height=False
) -> Skeleton:
"""Iteratively prune a skeleton using recursion removing the specified branch type.
Will repeatedly remove branches of the specified type until there are none left on the Skeleton
Parameters
----------
remove_branch_type: int
Remove branches of the specified type options are the types returned under `branch-type` by summarize and
the default is `1` which removes junction-to-endpoint branches.
0 endpoint-to-endpoint (isolated branch)
1 junction-to-endpoint
2 juntciont-to-junction
3 isolated cycle
Returns
-------
Skeleton
Returns a new Skeleton instance.
"""
pruned = Skeleton(skeleton, spacing=spacing, value_is_height=value_is_height)
branch_data = summarize(pruned)
while remove_branch_type in branch_data["branch-type"].unique():
# Remove branches that are small loops
pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == 1].index)
# Re-evaluate the skeleton and summarize
pruned = Skeleton(pruned.skeleton_image, spacing=spacing, value_is_height=value_is_height)
branch_data = summarize(pruned)
# Prune branches of type 3 (small loops)
pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == 3].index)
# Re-evaluate the skeleton and summarize
pruned = Skeleton(pruned.skeleton_image, spacing=spacing, value_is_height=value_is_height)
branch_data = summarize(pruned)
return pruned
1
1
and 3
This worked well, but one of the reasons we're seeing these messy skeletons with lots of branches in the first place is because the mask that forms the basis of the initial skeletonisation is itself very messy....
We can improve on this though by using a Gaussian blur of the original image (original and blurred are shown below)...
This gives a much cleaner mask and in turn a cleaner skeleton (mask and skeleton below)
Obviously this only needs a single round or pruning but I envisage this not always being the case and would like to have a generalised solution to itterative pruning so tried this against the above and it failed with...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[16], line 1
----> 1 smoothed_skeleton_pruned = iteratively_prune_paths(smoothed_skeleton)
2 plt.imshow(smoothed_skeleton_pruned)
Cell In[12], line 35, in iteratively_prune_paths(skeleton, remove_branch_type, spacing, value_is_height)
33 branch_data = summarize(pruned)
34 print(f"BEFORE 2 : {len(pruned.paths.indices)}")
---> 35 pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == 3].index)
36 pruned = Skeleton(pruned.skeleton_image, spacing=spacing, value_is_height=value_is_height)
37 branch_data = summarize(pruned)
File ~/work/git/hub/skan/src/skan/csr.py:648, in Skeleton.prune_paths(self, indices, skeletonize_method)
646 # optional cleanup:
647 new_skeleton = morphology.skeletonize(image_cp.astype(bool)) * image_cp
--> 648 return Skeleton(
649 new_skeleton,
650 spacing=self.spacing,
651 source_image=self.source_image,
652 )
File ~/work/git/hub/skan/src/skan/csr.py:476, in Skeleton.__init__(self, skeleton_image, spacing, source_image, keep_images, value_is_height)
474 self.nbgraph = csr_to_nbgraph(graph, self.pixel_values)
475 self.coordinates = np.transpose(coords)
--> 476 self.paths = _build_skeleton_path_graph(self.nbgraph)
477 self.n_paths = self.paths.shape[0]
478 self.distances = np.empty(self.n_paths, dtype=float)
File ~/work/git/hub/skan/src/skan/csr.py:382, in _build_skeleton_path_graph(graph)
380 path_data = np.zeros(path_indices.shape, dtype=float)
381 m, n = _build_paths(graph, path_indptr, path_indices, path_data, visited, degrees)
--> 382 paths = sparse.csr_matrix((path_data[:n], path_indices[:n], path_indptr[:m]), shape=(m - 1, n))
383 return paths
File ~/.virtualenvs/topostats/lib/python3.11/site-packages/scipy/sparse/_compressed.py:106, in _cs_matrix.__init__(self, arg1, shape, dtype, copy)
103 if dtype is not None:
104 self.data = self.data.astype(dtype, copy=False)
--> 106 self.check_format(full_check=False)
File ~/.virtualenvs/topostats/lib/python3.11/site-packages/scipy/sparse/_compressed.py:169, in _cs_matrix.check_format(self, full_check)
167 # check index pointer
168 if (len(self.indptr) != major_dim + 1):
--> 169 raise ValueError("index pointer size ({}) should be ({})"
170 "".format(len(self.indptr), major_dim + 1))
171 if (self.indptr[0] != 0):
172 raise ValueError("index pointer should start with 0")
ValueError: index pointer size (0) should be (1)
My suspicion is that after removal of the branch the skeleton is a loop and therefore all only one path is left of type 3
which then is itself removed.
I was cognisant that this would be a related problem if skeletons are linear because iteratively removing branches of type 1
would just prune away until there is nothing left so we need another way of stopping pruning other than purely assessing whether there are paths
of a certain type. Initial thoughts are that if the end-to-end distance is degrading we could say stop, but I don't think that would work well for circular loops after pruning as they'd get removed.
As suspected everything was getting pruned out.
The following functions works for this particular image.
def iteratively_prune_paths(skeleton: Skeleton, spacing: float = 1e-9, value_is_height: bool = False) -> Skeleton:
"""Iteratively prune a skeleton using recursion removing the specified branch type.
Will repeatedly remove branches of type 1 and 3 until there are none left on the Skeleton.
0 endpoint-to-endpoint (isolated branch)
1 junction-to-endpoint
2 juntciont-to-junction
3 isolated cycle
Parameters
----------
spacing: float
Scale of pixel spacing along each axis.
value_is_height: bool
Whether t consider the value of a float skeleton to be the "height" of the image.
Returns
-------
Skeleton
Returns a new Skeleton instance.
"""
pruned = Skeleton(skeleton, spacing=spacing, value_is_height=value_is_height)
branch_data = summarize(pruned)
def _remove_branch_type(
_skeleton: Skeleton,
_branch_data: pd.DataFrame,
branch_type: int,
):
_skeleton = _skeleton.prune_paths(_branch_data.loc[_branch_data["branch-type"] == branch_type].index)
return _skeleton, summarize(_skeleton)
while branch_data["branch-type"].shape[0] > 1:
# Remove branches that have endpoint (branch_type == 1)
pruned, branch_data = _remove_branch_type(pruned, branch_data, branch_type=1)
# We now need to check whether we have just a single path, if so we're done pruning
if branch_data.shape[0] == 1:
break
# If not we need to remove any small side loops (branch_type == 3)
pruned, branch_data = _remove_branch_type(pruned, branch_data, branch_type=3)
# We don't need to check if we have a single path as that is the control check for the while loop
return pruned
After Gaussian blurring there is a loop with a single side-branch for pruning...
Applying iterative pruning but with the checks to make sure that we break out when only a single path is left (that might be a loop as in this example or a linear skeleton), then we are left with...
I've been trying to write some tests for the work I've done (see ns-rse/iterative-pruning
branch on my fork)but in doing so have realised there is another problem that needs addressing before this would be ready to be included.
Documenting it here for reference.
I wanted to create dummy skeletons to test things out so added the following to src/skan/_testdata.py
from skimage.draw import random_shapes
from skimage.morphology import skeletonize
# Generate a random skeletons, first is a skeleton with a closed loop with side branches
kwargs = {"image_shape": (128, 128),
"max_shapes": 20,
"channel_axis": None,
"shape": None,
"rng": 1,
"allow_overlap": True,
"min_size": 20}
random_images, _ = random_shapes(**kwargs)
mask = np.where(random_images != 255, 1, 0)
skeleton_loop = skeletonize(mask)
# Then a linear skeleton with side-branches
kwargs["rng"] = 13588686514
random_images, _ = random_shapes(**kwargs)
mask = np.where(random_images != 255, 1, 0)
skeleton_linear = skeletonize(mask)
This produces two skeletons which I then prune in src/skan/test/test_csr.py
from skan._testdata import (
tinycycle, tinyline, skeleton0, skeleton1, skeleton2, skeleton3d,
topograph1d, skeleton4, skeleton_loop, skeleton_linear
@pytest.mark.parametrize(
"skeleton, paths", [[skeleton_loop, 1], [skeleton_linear, 1]]
)
def test_iteratively_prune_paths(skeleton: np.ndarray, paths: int) -> None:
"""Test iteratively pruning a skeleton."""
pruned_skeleton = csr.iteratively_prune_paths(skeleton)
skeleton_summary = csr.summarize(pruned_skeleton)
assert isinstance(pruned_skeleton, csr.Skeleton)
assert skeleton_summary.shape[0] == 1
I was about to add more tests to check the statistics of the returned skeleton but its sensible to check what the skeletons look like before and after pruning.
The random skeleton with a loop looks like this...
In both cases the pruned skeletons are not what I would expect, for the skeleton with a loop I was hoping to get the loop returned without any of the side chains.
For the linear molecule I was hoping to have the longest single path, but because of the junctions the "spine" (longest path) is a number of smaller paths and these keep on getting pruned.
The logic needs changing in a few places to handle this but I haven't yet thought through it and come up with a solution so this is a place holder.
Hey @ns-rse, good investigating while I was away! 😂
There's a function in skan to find the longest-shortest-path, which would give you all the branches in that path. You could then whitelist those when pruning and ensure they're not pruned.
Similarly, for the cycle, you could whitelist the paths that form the largest simple cycle in the original skeleton?
I'm still very busy until the 14th, should be able to revisit this more thoroughly after that... I hope the ideas above help open some paths forward...
Also in the meantime, any PRs for making some of those error messages clearer would be handy! For example, for the out-of-bounds indices for the CSR data structure, we could catch that error and return an IndexError?
Hi @jni,
Sorry for only just clocking this was on leave last week myself. Hope you had a good break.
I'd clocked the find_main_branch
option to summarize
and have successfully leveraged that and tweaked my algorithm, will check out the simple_cycles
but am on holiday (again) from tomorrow until 4th September.
Will try and get some better error reporting done this afternoon though.
I've been working on writing tests and examples/documentation on my branch ns-rse/iterative-pruning
Some images from the tests...
Looks fantastic @ns-rse!
No rush from me on this so happy to keep working together (maybe we can set up another call) once I'm back in Oz on Sept 19. I might grab your branch in the meantime and tweak things!
Thanks @jni and yes lets have another call when things are more settled.
I saw mention in #208 about undertaking pruning in graph space which sounds as though it would be more efficient than the solution proposed here.
Hi @jni let me know when we be good to meet and talk through the pruning approach. It doesn't feel like its very fast in what I've implemented and am curious whether #208 would improve this. My reading is that is perhaps a proposal to change pruning to NetworkX and I wonder if the betweenness_centrality
might be of use here.
Hey @ns-rse! So unfortunately I caught COVID on my way home and am feeling pretty terrible atm... please ping me again in ~1 week? Hopefully that is enough recovery time... 🤞
Hi @jni,
Hope you've recovered from COVID and are feeling better, I was on leave last week. Let me know when would be good to meet up and discuss iterative pruning.
Wow, timing! I was literally just looking at your PR when you wrote! 😲 I think we can probably be a bit more pre-emptive with the error message, ie check the input rather than try/except inside a loop. Anyway, happy to meet up Thu or Wed/Thu/Fri next week? As early as you can make it on those days.
Hi @jni,
Would Friday at 07:30(BST)/19:30(AEDT I think) be ok for you? I have to drop my daughter off on Wed/Thu mornings which would push back the meeting time to later.
Attempting to develop pruning we've encountered the following error.
This was created using the attached png.
Method that is being used is...