jni / zarpaint

Paint segmentations directly to on-disk/remote zarr arrays
BSD 3-Clause "New" or "Revised" License
14 stars 8 forks source link

Watershed split deletes label ids instead of reusing them #39

Open GenevieveBuckley opened 1 year ago

GenevieveBuckley commented 1 year ago

I've noticed that the watershed split function deletes the original label ids under the point markers, and replaces them with consecutive label ids greater than the current max label.

Is this the intended behaviour? It is a little annoying that np.unique(labels) is now no longer a consecutive range, and it might be more convenient if it re-used existing label ids.

Of course, you can end up in this situation in multiple ways (deleting unwanted labels entirely, running watershed split on two labels that were not connected in the first place, etc.), so users should probably make their code robust to this possiblilty anyway - it might not be a think that requires fixing

jni commented 1 year ago

Yeah I would say that I have always seen labels as arbitrary unique IDs, so it was not a goal to keep them sequential — and as you've noted it's super easy to lose consecutive-ness by doing other operations anyway — no one would expect that deleting a label would relabel all labels above it.

The watershed thing is simply because the implementation was easier: the watershed returns labels in {1, 2, 3, ...}, you can just add the max label and go. Otherwise, you would have to add the max label to only {1, 2, ...} and relabel n to your original label. So that might lead to clunkier code for a minor gain.

Having said this, it's not a non-goal to have them not be consecutive. A PR that preserved the original label for one of the resulting split labels would be welcome. I'll let you use your own judgement about whether the code complexity would be worth it!

GenevieveBuckley commented 1 year ago

I'm ambivalent on the question of code complexity