ValueError: not enough values to unpack (expected 2, got 0)

ns-rse commented 1 year ago

Attempting to develop pruning we've encountered the following error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 pruned_skeleton = skeleton.iteratively_prune_paths(remove_branch_type=1)
      2 all_paths, paths_table = prep_plot(pruned_skeleton)

File ~/work/git/hub/skan/src/skan/csr.py:674, in Skeleton.iteratively_prune_paths(self, remove_branch_type)
    672 pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == remove_branch_type].index)
    673 # Remove branches that are small loops
--> 674 pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == 3].index)
    676 branch_data = summarize(pruned)
    677 iteration += 1

File ~/work/git/hub/skan/src/skan/csr.py:634, in Skeleton.prune_paths(self, indices, skeletonize_method)
    632 image_cp = np.copy(self.skeleton_image)
    633 for i in indices:
--> 634     pixel_ids_to_wipe = self.path(i)
    635     junctions = self.degrees[pixel_ids_to_wipe] > 2
    636     pixel_ids_to_wipe = pixel_ids_to_wipe[~junctions]

File ~/work/git/hub/skan/src/skan/csr.py:513, in Skeleton.path(self, index)
    491 """Return the pixel indices of path number `index`.
    492 
    493 Parameters
   (...)
    502     endpoints.
    503 """
    504 # The below is equivalent to `self.paths[index].indices`, which is much
    505 # more elegant. However the below version is about 25x faster!
    506 # In [14]: %timeit mat[1].indices
   (...)
    511 # ...:
    512 # 5.05 µs ± 77.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
--> 513 start, stop = self.paths.indptr[index : index + 2]
    514 return self.paths.indices[start:stop]

ValueError: not enough values to unpack (expected 2, got 0)

This was created using the attached png.

Method that is being used is...

    def iteratively_prune_paths(self, remove_branch_type: int = 1) -> Self:
        """Iteratively prune a skeleton using recursion removing the specified branch type.

        Will repeatedly remove branches of the specified type until there are none left on the Skeleton
        Parameters
        ----------
        remove_branch_type: int
            Remove branches of the specified type options are the types returned under `branch-type` by summarize and
        the default is `1` which removes junction-to-endpoint branches.

              0 endpoint-to-endpoint (isolated branch)
              1 junction-to-endpoint
              2 juntciont-to-junction
              3 isolated cycle

        Returns
        -------
        Skeleton
            Returns a new Skeleton instance.
        """
        pruned = Skeleton(self.skeleton_image, spacing=self.spacing, value_is_height=self.value_is_height)
        branch_data = summarize(pruned)
        iteration = 0
        while remove_branch_type in branch_data["branch-type"].unique():
            pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == remove_branch_type].index)
            # Remove branches that are small loops
            pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == 3].index)

            branch_data = summarize(pruned)
            iteration += 1
        return pruned

ns-rse commented 1 year ago

I've been investigating this and think I've worked out why this problem occurs.

I think its because after running pruned.pune_paths(branch_data.loc[branch_data["branch-type"] == remove_brnach_type].index) the first time it removes elements from the Skeleton.path and so the resulting length is shortened and when we then try to remove those of type 3 we are trying to use the original indices which no longer exist (that is to say the indexing error is not occurring with branch_data but with Skeleton.path).

Thus after each round of pruning we need to renew/reassess the skeleton and then update the branch_data so we have the correct indices.

I broke the method out into a function to investigate this and this works with the sample image attached above...

def iteratively_prune_paths(
    skeleton: Skeleton, remove_branch_type: int = 1, spacing: float = 1e-9, value_is_height=False
) -> Skeleton:
    """Iteratively prune a skeleton using recursion removing the specified branch type.

    Will repeatedly remove branches of the specified type until there are none left on the Skeleton
    Parameters
    ----------
    remove_branch_type: int
        Remove branches of the specified type options are the types returned under `branch-type` by summarize and
    the default is `1` which removes junction-to-endpoint branches.

          0 endpoint-to-endpoint (isolated branch)
          1 junction-to-endpoint
          2 juntciont-to-junction
          3 isolated cycle

    Returns
    -------
    Skeleton
        Returns a new Skeleton instance.
    """
    pruned = Skeleton(skeleton, spacing=spacing, value_is_height=value_is_height)
    branch_data = summarize(pruned)
    while remove_branch_type in branch_data["branch-type"].unique():
        # Remove branches that are small loops
        pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == 1].index)
        # Re-evaluate the skeleton and summarize
        pruned = Skeleton(pruned.skeleton_image, spacing=spacing, value_is_height=value_is_height)
        branch_data = summarize(pruned)
        # Prune branches of type 3 (small loops)
        pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == 3].index)
        # Re-evaluate the skeleton and summarize
        pruned = Skeleton(pruned.skeleton_image, spacing=spacing, value_is_height=value_is_height)
        branch_data = summarize(pruned)
    return pruned

Initial Skeleton

Initial Pruning - Iterratively removing only branches of type `1`

Corrected Pruning - Iterratively removing branches of type `1` and `3`

Further investigations

This worked well, but one of the reasons we're seeing these messy skeletons with lots of branches in the first place is because the mask that forms the basis of the initial skeletonisation is itself very messy....

We can improve on this though by using a Gaussian blur of the original image (original and blurred are shown below)...

This gives a much cleaner mask and in turn a cleaner skeleton (mask and skeleton below)

Obviously this only needs a single round or pruning but I envisage this not always being the case and would like to have a generalised solution to itterative pruning so tried this against the above and it failed with...

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[16], line 1
----> 1 smoothed_skeleton_pruned = iteratively_prune_paths(smoothed_skeleton)
      2 plt.imshow(smoothed_skeleton_pruned)

Cell In[12], line 35, in iteratively_prune_paths(skeleton, remove_branch_type, spacing, value_is_height)
     33 branch_data = summarize(pruned)
     34 print(f"BEFORE 2 : {len(pruned.paths.indices)}")
---> 35 pruned = pruned.prune_paths(branch_data.loc[branch_data["branch-type"] == 3].index)
     36 pruned = Skeleton(pruned.skeleton_image, spacing=spacing, value_is_height=value_is_height)
     37 branch_data = summarize(pruned)

File ~/work/git/hub/skan/src/skan/csr.py:648, in Skeleton.prune_paths(self, indices, skeletonize_method)
    646 # optional cleanup:
    647 new_skeleton = morphology.skeletonize(image_cp.astype(bool)) * image_cp
--> 648 return Skeleton(
    649     new_skeleton,
    650     spacing=self.spacing,
    651     source_image=self.source_image,
    652 )

File ~/work/git/hub/skan/src/skan/csr.py:476, in Skeleton.__init__(self, skeleton_image, spacing, source_image, keep_images, value_is_height)
    474 self.nbgraph = csr_to_nbgraph(graph, self.pixel_values)
    475 self.coordinates = np.transpose(coords)
--> 476 self.paths = _build_skeleton_path_graph(self.nbgraph)
    477 self.n_paths = self.paths.shape[0]
    478 self.distances = np.empty(self.n_paths, dtype=float)

File ~/work/git/hub/skan/src/skan/csr.py:382, in _build_skeleton_path_graph(graph)
    380 path_data = np.zeros(path_indices.shape, dtype=float)
    381 m, n = _build_paths(graph, path_indptr, path_indices, path_data, visited, degrees)
--> 382 paths = sparse.csr_matrix((path_data[:n], path_indices[:n], path_indptr[:m]), shape=(m - 1, n))
    383 return paths

File ~/.virtualenvs/topostats/lib/python3.11/site-packages/scipy/sparse/_compressed.py:106, in _cs_matrix.__init__(self, arg1, shape, dtype, copy)
    103 if dtype is not None:
    104     self.data = self.data.astype(dtype, copy=False)
--> 106 self.check_format(full_check=False)

File ~/.virtualenvs/topostats/lib/python3.11/site-packages/scipy/sparse/_compressed.py:169, in _cs_matrix.check_format(self, full_check)
    167 # check index pointer
    168 if (len(self.indptr) != major_dim + 1):
--> 169     raise ValueError("index pointer size ({}) should be ({})"
    170                      "".format(len(self.indptr), major_dim + 1))
    171 if (self.indptr[0] != 0):
    172     raise ValueError("index pointer should start with 0")

ValueError: index pointer size (0) should be (1)

My suspicion is that after removal of the branch the skeleton is a loop and therefore all only one path is left of type 3 which then is itself removed.

I was cognisant that this would be a related problem if skeletons are linear because iteratively removing branches of type 1 would just prune away until there is nothing left so we need another way of stopping pruning other than purely assessing whether there are paths of a certain type. Initial thoughts are that if the end-to-end distance is degrading we could say stop, but I don't think that would work well for circular loops after pruning as they'd get removed.

ns-rse commented 1 year ago

As suspected everything was getting pruned out.

The following functions works for this particular image.

def iteratively_prune_paths(skeleton: Skeleton, spacing: float = 1e-9, value_is_height: bool = False) -> Skeleton:
    """Iteratively prune a skeleton using recursion removing the specified branch type.

    Will repeatedly remove branches of type 1 and 3 until there are none left on the Skeleton.

          0 endpoint-to-endpoint (isolated branch)
          1 junction-to-endpoint
          2 juntciont-to-junction
          3 isolated cycle

    Parameters
    ----------
    spacing: float
        Scale of pixel spacing along each axis.
    value_is_height: bool
        Whether t consider the value of a float skeleton to be the "height" of the image.

    Returns
    -------
    Skeleton
        Returns a new Skeleton instance.
    """
    pruned = Skeleton(skeleton, spacing=spacing, value_is_height=value_is_height)
    branch_data = summarize(pruned)

    def _remove_branch_type(
        _skeleton: Skeleton,
        _branch_data: pd.DataFrame,
        branch_type: int,
    ):
        _skeleton = _skeleton.prune_paths(_branch_data.loc[_branch_data["branch-type"] == branch_type].index)
        return _skeleton, summarize(_skeleton)

    while branch_data["branch-type"].shape[0] > 1:
        # Remove branches that have endpoint (branch_type == 1)
        pruned, branch_data = _remove_branch_type(pruned, branch_data, branch_type=1)
        # We now need to check whether we have just a single path, if so we're done pruning
        if branch_data.shape[0] == 1:
            break
        # If not we need to remove any small side loops (branch_type == 3)
        pruned, branch_data = _remove_branch_type(pruned, branch_data, branch_type=3)
        # We don't need to check if we have a single path as that is the control check for the while loop
    return pruned

After Gaussian blurring there is a loop with a single side-branch for pruning...

Applying iterative pruning but with the checks to make sure that we break out when only a single path is left (that might be a loop as in this example or a linear skeleton), then we are left with...

ns-rse commented 1 year ago

I've been trying to write some tests for the work I've done (see ns-rse/iterative-pruning branch on my fork)but in doing so have realised there is another problem that needs addressing before this would be ready to be included.

Documenting it here for reference.

Dummy Skeletons

I wanted to create dummy skeletons to test things out so added the following to src/skan/_testdata.py

from skimage.draw import random_shapes
from skimage.morphology import skeletonize

# Generate a random skeletons, first is a skeleton with a closed loop with side branches
kwargs = {"image_shape": (128, 128),
          "max_shapes": 20,
          "channel_axis": None,
          "shape": None,
          "rng": 1,
          "allow_overlap": True,
          "min_size": 20}
random_images, _ = random_shapes(**kwargs)
mask = np.where(random_images != 255, 1, 0)
skeleton_loop = skeletonize(mask)
# Then a linear skeleton with side-branches
kwargs["rng"] = 13588686514
random_images, _ = random_shapes(**kwargs)
mask = np.where(random_images != 255, 1, 0)
skeleton_linear = skeletonize(mask)

This produces two skeletons which I then prune in src/skan/test/test_csr.py

from skan._testdata import (
         tinycycle, tinyline, skeleton0, skeleton1, skeleton2, skeleton3d,
         topograph1d, skeleton4, skeleton_loop, skeleton_linear

@pytest.mark.parametrize(
        "skeleton, paths", [[skeleton_loop, 1], [skeleton_linear, 1]]
        )
def test_iteratively_prune_paths(skeleton: np.ndarray, paths: int) -> None:
    """Test iteratively pruning a skeleton."""
    pruned_skeleton = csr.iteratively_prune_paths(skeleton)
    skeleton_summary = csr.summarize(pruned_skeleton)
    assert isinstance(pruned_skeleton, csr.Skeleton)
    assert skeleton_summary.shape[0] == 1

I was about to add more tests to check the statistics of the returned skeleton but its sensible to check what the skeletons look like before and after pruning.

Looped Skeleton

The random skeleton with a loop looks like this...

Before Pruning

After Pruning

Linear Skeleton

Before Pruning

After Pruning

The Problem...

In both cases the pruned skeletons are not what I would expect, for the skeleton with a loop I was hoping to get the loop returned without any of the side chains.

For the linear molecule I was hoping to have the longest single path, but because of the junctions the "spine" (longest path) is a number of smaller paths and these keep on getting pruned.

Possible solutions

The logic needs changing in a few places to handle this but I haven't yet thought through it and come up with a solution so this is a place holder.

jni commented 1 year ago

Hey @ns-rse, good investigating while I was away! 😂

There's a function in skan to find the longest-shortest-path, which would give you all the branches in that path. You could then whitelist those when pruning and ensure they're not pruned.

Similarly, for the cycle, you could whitelist the paths that form the largest simple cycle in the original skeleton?

I'm still very busy until the 14th, should be able to revisit this more thoroughly after that... I hope the ideas above help open some paths forward...

Also in the meantime, any PRs for making some of those error messages clearer would be handy! For example, for the out-of-bounds indices for the CSR data structure, we could catch that error and return an IndexError?

ns-rse commented 1 year ago

Hi @jni,

Sorry for only just clocking this was on leave last week myself. Hope you had a good break.

I'd clocked the find_main_branch option to summarize and have successfully leveraged that and tweaked my algorithm, will check out the simple_cycles but am on holiday (again) from tomorrow until 4th September.

Will try and get some better error reporting done this afternoon though.

I've been working on writing tests and examples/documentation on my branch ns-rse/iterative-pruning

Some images from the tests...

Loop/cycle

Linear

Multiple

jni commented 1 year ago

Looks fantastic @ns-rse!

No rush from me on this so happy to keep working together (maybe we can set up another call) once I'm back in Oz on Sept 19. I might grab your branch in the meantime and tweak things!

ns-rse commented 1 year ago

Thanks @jni and yes lets have another call when things are more settled.

I saw mention in #208 about undertaking pruning in graph space which sounds as though it would be more efficient than the solution proposed here.

ns-rse commented 1 year ago

Hi @jni let me know when we be good to meet and talk through the pruning approach. It doesn't feel like its very fast in what I've implemented and am curious whether #208 would improve this. My reading is that is perhaps a proposal to change pruning to NetworkX and I wonder if the betweenness_centrality might be of use here.

jni commented 1 year ago

Hey @ns-rse! So unfortunately I caught COVID on my way home and am feeling pretty terrible atm... please ping me again in ~1 week? Hopefully that is enough recovery time... 🤞

ns-rse commented 1 year ago

Hi @jni,

Hope you've recovered from COVID and are feeling better, I was on leave last week. Let me know when would be good to meet up and discuss iterative pruning.

jni commented 1 year ago

Wow, timing! I was literally just looking at your PR when you wrote! 😲 I think we can probably be a bit more pre-emptive with the error message, ie check the input rather than try/except inside a loop. Anyway, happy to meet up Thu or Wed/Thu/Fri next week? As early as you can make it on those days.

ns-rse commented 1 year ago

Hi @jni,

Would Friday at 07:30(BST)/19:30(AEDT I think) be ok for you? I have to drop my daughter off on Wed/Thu mornings which would push back the meeting time to later.

jni / skan