seung-lab / connected-components-3d

Connected components on discrete and continuous multilabel 3D & 2D images. Handles 26, 18, and 6 connected variants; periodic boundaries (4, 8, & 6)
GNU Lesser General Public License v3.0
367 stars 43 forks source link

max_labels questions #15

Closed unidesigner closed 5 years ago

unidesigner commented 5 years ago

I'm a bit confused by the max_labels argument. If I run a connected_component call without the argument and then do a np.max(labels_out), I should get the number of components in the version after the recent 1.2.0 release. However, if now use this number with some margin to set max_labels, the procedure fails with exception:

Connected Components Error: Label 60000 cannot be mapped to union-find array of length 60000.
terminate called after throwing an instance of 'char const*'

It seems that internally, the union-find algorithm requires a higher number, but it is not clear to me how to estimate this number.

It would be great to find a way to reduce the peak memory footprint of this very nice package. :)

william-silversmith commented 5 years ago

Hi Stephan,

The max_labels argument is a memory reduction hack that's not guaranteed to work well. Typically, I try to estimate it (for connectomics data) as a "representative volume" number of labels plus a large safety factor. In Kimimaro, right or wrong, it is set to 1/4th the number of voxels in the volume.

I might have a better way to reduce memory usage. I'm currently experimenting with removing the "union by size" feature from union-find. In some experiments, it reduces memory usage by half and improves performance ~10% on a set of connectomics labels and on random arrays. However, it doesn't make a ton of sense to me that it's faster.

If I can find a theoretical justification for it I'd be happy to release that, as it's less code and more performant.

william-silversmith commented 5 years ago

I added issue #16 for memory reduction discussion.

william-silversmith commented 5 years ago

@unidesigner Check out this PR and let me know what you think. https://github.com/seung-lab/connected-components-3d/pull/17

william-silversmith commented 5 years ago

@unidesigner I released v1.2.2, which can achieve lower than 2x the previous memory consumption and is also ~40% faster.

unidesigner commented 5 years ago

@william-silversmith Fantastic news - thanks for getting back to this so quickly! I will test it tomorrow and report back asap.

unidesigner commented 5 years ago

It works for me too, is much faster and uses less memory! The number of regions it finds is still the same, so so much for an additional datapoint to confirm that things still work. I close this is issue as I don't think there is not much else we can do about the max_label option at the moment.