imeka / ndarray-ndimage

Multidimensional image processing for ndarray
Apache License 2.0
14 stars 2 forks source link

Allow label type to be specified #25

Closed matsjoyce-refeyn closed 1 year ago

matsjoyce-refeyn commented 1 year ago

I'm using label to label image stacks, and the total number of pixels is higher than 2**16-1 and so it would be safer to label using u32 instead of u16 (to avoid panics or corruption if we overflow).

From some very rough timings, labelling using u32 is 10% slower than u16, and u64 is 41% slower. So this PR adds a generic argument to label so that client code can select the output type (unfortunately the default generic type feature is being removed for functions, so I can't make it default to u16). The code does become a bit more messy with unwraps, but performance when using u16 seems to be unaffected (most of the unwraps are removed when optimisation is turned on).

I've added a check and a test to make sure we get a panic when labels overflow.

One remaining question is about largest_connected_components: it does it's own labelling, which it does not expose, and so client code cannot specify which type it uses for labelling. I've hardcoded it to u16 and noted that in the docs, does that sound like a reasonable compromise? It should be easy for the client code to reimplement if it needs a wider type.

nilgoyette commented 1 year ago

From some very rough timings, labelling using u32 is 10% slower than u16, and u64 is 41% slower.

I don't understand why it gets that slow. I'm by no means a performance expert, but I would have guessed a slowdown < 10% for bigger types. Can I have your opinion on this matter?

matsjoyce-refeyn commented 1 year ago

From some very rough timings, labelling using u32 is 10% slower than u16, and u64 is 41% slower.

I don't understand why it gets that slow. I'm by no means a performance expert, but I would have guessed a slowdown < 10% for bigger types. Can I have your opinion on this matter?

I'm not really sure, I used VTune on it and nothing is significantly different. But the bits that become slower seem to be the places were we are copying buffers or doing indexing, so I think it's due to the amount of memory needed doubling (for u32) and quadrupling (for u64).

matsjoyce-refeyn commented 1 year ago

Sorry for the delay. I was really busy.

No worries, thanks for merging this!