KeyError when adding boundary with label 0 in oversegmentation

neptunes5thmoon commented 9 years ago

I'm trying to add a region with label 0 in the oversegmentation to represent boundary. But whenever there's a 0 in the oversegmentation (even only one at a random position) gala raises a KeyError during the training. Some debugging revealed that node includes an entry that does not contain any properties. However, the index of that node changes from run to run.

Traceback (most recent call last): File "/home/lheinric/PycharmProjects/GALA_test/own_test.py", line 128, in test() File "/home/lheinric/PycharmProjects/GALA_test/own_test.py", line 60, in test (X, y, w, merges) = g_train.learn_agglomerate(gt_train, fc)[0] File "/home/lheinric/.local/lib/python2.7/site-packages/gala/agglo.py", line 1261, in learn_agglomerate learning_mode, labeling_mode)) File "/home/lheinric/.local/lib/python2.7/site-packages/gala/agglo.py", line 1408, in _learn_agglomerate node_id = g.merge_nodes(n1, n2, merge_priority) File "/home/lheinric/.local/lib/python2.7/site-packages/gala/agglo.py", line 1579, in merge_nodes self.refine_post_merge_boundaries(n1, n2, sp2segment) File "/home/lheinric/.local/lib/python2.7/site-packages/gala/agglo.py", line 1633, in refine_post_merge_boundaries self.update_merge_queue(u, v) File "/home/lheinric/.local/lib/python2.7/site-packages/gala/agglo.py", line 1739, in update_merge_queue w = self.merge_priority_function(self,u,v) File "/home/lheinric/.local/lib/python2.7/site-packages/gala/agglo.py", line 207, in predict features = feature_extractor(g, n1, n2) File "/home/lheinric/.local/lib/python2.7/site-packages/gala/features/base.py", line 9, in call return self.compute_features(g, n1, n2) File "/home/lheinric/.local/lib/python2.7/site-packages/gala/features/base.py", line 19, in compute_features if g.node[n1]['size'] > g.node[n2]['size']: KeyError: 'size'

jni commented 9 years ago

Hi @lheinric!

Thanks for using gala!

This is a known issue, actually, and am working on "fixing" it for 0.3. The issue is that, historically, gala has considered the label 0 to be a boundary between superpixels. (This is because we started out using an implementation of watershed that placed these 0-boundaries between superpixels, much like Matlab does.) We've since moved to "complete" segmentations (every pixel belongs to a segment; no boundaries), and the 0 label bits of the code have been neglected.

Do you have a way of generating an oversegmentation with no boundaries? We are actually going to deprecate that functionality, because it's become extremely slow compared to the no-boundary method (and buggy, as you have noticed). However, I might consider keeping it if you have a compelling reason to use it.

jmrk84 commented 9 years ago

Hi Juan, your comment about speed is interesting, I would have expected the exact opposite: If we label the existing boundary as 1 instead of 0, the resulting supervoxel graph should be huge with thousands of useless features and edges since we never want to merge anything with the boundary. So every supervoxel will make an edge with the 1, but this we never want to have merged anyways. So I'm not sure how you wanna avoid this issue with the no-boundary method. I should add that we have a huge and thick boundary, since we work with data that has extracellular space preservation.

jni commented 9 years ago

Hi @jmrk84,

The number of edges is not at all a limiting factor. The problem was that 0-boundaries require refinement at the voxel level after each merge, and this was done in pure Python. Even worse, as the boundaries got bigger, so did this step, so there was some quadratic growth in that process. To cap it all off somewhere along the way a bug was introduced so that eventually it would crash, as you know. =P

Anyway, I'm guessing what you are after is a mask, so that all your extracellular space doesn't get included in the RAG. Am I right? I take it you have a good enough predictor of extracellular space? If you remove something from the RAG there's no recovering! =)

If that's what you're after, it should actually be quite trivial to add, because build_rag_from_watershed has an idxs keyword argument that lets you specify which locations to look at when building the graph. There's no way to access it from the constructor, but it should be straightforward to add.

Let me know if that would address your use case!

jni commented 9 years ago

@lheinric @jmrk84 the latest master should let you mask out thick boundaries by passing a mask= keyword argument to the RAG constructor. You should not have boundaries at all between superpixels that you want to merge, though. See #53, especially the new test_mask function in tests/test_agglo.py.

neptunes5thmoon commented 9 years ago

Hi @jni That seems to be working :) Thanks a lot! Profiling the code I noticed that the construction of the graph is quite expensive. Do you think implementing it with cython would be useful? And one more question: I want to train gala on several data cubes. Is the right way to change the initial policy to the one learned previously or am I missing some functionality here?

jni commented 9 years ago

Hi @lheinric! Great to hear! Since the issue is resolved, I'm going to close it and move the more open discussion to the mailing list, here. The short answer for both is "maybe". =P

janelia-flyem / gala

KeyError when adding boundary with label 0 in oversegmentation #52