image-rs / imageproc

Image processing operations
MIT License
761 stars 150 forks source link

Provide Iterator instead of allocating memory every time for FAST corners #308

Open vadixidav opened 5 years ago

vadixidav commented 5 years ago

It would be good if when you ran the FAST detector that it returned an impl Iterator<Item=Corner> instead of a Vec<Corner> so that the corners can then be appropriately filtered as they are being computed.

theotherphil commented 5 years ago

Do you have a concrete use case for this?

The FAST feature detector code needs some improvement in general, so if there's demand I'd be happy to spend some time on it (or even happier to accept PRs!). I agree that allocating a vector each time is bad, but I suspect that returning an impl iterator would result in worse performance than pre-allocating a large enough buffer and filling that up on each call. We'd need to profile and see.

HeroicKatora commented 5 years ago

It does however not preallocate anything :smile: https://github.com/PistonDevelopers/imageproc/blob/2f1f073218535fcdc6e235cd19bcccfd7e968cb9/src/corners.rs#L60-L74

I don't see the possibility to sensibly preallocate within the function–but I don't know much about FAST–but when we'd return a custom iterator with an implemented maximum size hint then that would at least be possible to do for consumers. If we want to optimize this incredibly much then maybe even unsafe TrustedLen as an optional implementation if it bceomes stable

theotherphil commented 5 years ago

Right, I meant that I agree the current implementation is suboptimal but not necessarily on the suggested fix.

vadixidav commented 5 years ago

My concrete usecase is (from a more selfish perspective) trying to make a feature detection pipeline more functional. The more abstract usecase would just be avoiding writing the features to a vector and sticking them right away into a heap to get the ones with the highest harris scores.

theotherphil commented 5 years ago

Ok, thanks. This would be an easy change to make to the current code, but probably wouldn't play nicely with the changes we'd need to actually make this function fast (e.g. adding SIMD). I'm not convinced that a streaming version of this function would be better than one which accepts a pre-allocated buffer (for performance purposes anyway), but I'm happy for the two versions to coexist in this crate if required.

Edit: and in any case we should add at least one version of this function that doesn't allocate!