image-rs / imageproc

Image processing operations
MIT License
758 stars 149 forks source link

Allow callers to pass in output buffers #685

Open tnibler opened 2 months ago

tnibler commented 2 months ago

I'm finding myself wanting to scale/blur/whatever images into an ndarray Array3. But imageproc functions all allocate their own output, which is suboptimal in many cases, for example when:

I'm not sure adding do_thing_into functions for every function in the library is the best way to do this and using inout parameters is cumbersome compared to the current default, so some more generic way to make operations (somewhat) agnostic to where they shove their output might be appropriate.

I'm willing to put some time into this, but given that this would break literally every API in the crate I wanted to ask for some opinions before maybe investigating how this could be done.

ripytide commented 2 months ago

Have you seen the _mut() variants of many of the functions in the crate since they don't allocate as they take &mut Image argument? (Maybe not in the latest published version though since we've had a lot of changes since the last release)

As for processing on data from external sources I think you should be able to construct an ImageBuffer struct from any &[u8] though.

tnibler commented 2 months ago

Those are in place though (at least the ones I found, like rotate and flip), which is different again so that makes three versions of every function: the current one returning a new image, _mut in-place and _into with the output buffer passed in. I'm not sure what the standard way of simulating overloaded functions is in Rust, but surely there is a nicer way than polluting the namespace and autocomplete with 3+ versions of every function, what do you think?

ripytide commented 2 months ago

See #622 where I had similar thoughts. There are a few tradeoffs involved in the current method but the new doc macro has helped somewhat in preventing duplicated doc-comments and since the non-mut variant usually calls the mut variant that also prevents code duplication.

tnibler commented 2 months ago

What do you think about something like this?

You would only have to implement op_into (and in-place versions if possible) and get the rest generated automatically.

ripytide commented 2 months ago

I think that could work, but some downsides I can think of might be:

ripytide commented 2 months ago

On the other hand standardizing image ops into traits does make it possible to do more advanced usage in generic scenarios such as taking any image ops as an input itself like higher kinded programming does with functions:

fn apply_op_twice<O>(op : O, image: Image) where O: ImageOp {
     Op.call(image);
     Op.call(image);
}

Maybe we could even do both methods and write a proc macro which would generate our current non-trait free-standing functions which internally then construct the op struct and then call the structs op impl. This way we get the best of both worlds and give the users the flexibility to choose which method they prefer. At the cost of increased beginner difficulty.

tnibler commented 2 months ago

Okay, so since there is some interest I'll try and come up with some possible design ideas that can take into account some of the more complex cases and the points you mentioned.

Docs might be worse...

I think you'd just put the docs for how a filter works on the struct for it, so they appear at the very top and not in the trait docs.

but might be more ergonomic for repeated calls

More than ergonomics actually, since SomeFilter::new(...) can precompute stuff like kernels that can then be reused for mutiple operations afterwards. So depending on the application performance could be a tiny bit better (if a kernel is particularly expensive to compute and you're operating on tiny images I guess, it's probably negligible in most normal cases). Or more realistically: an operation needs some other temporary buffer of known size, which can now be allocated once an then reused instead of reallocating a new one every iteration.

tnibler commented 1 month ago

Ok, I have thought about some of these points and hope to flesh them out in a master's thesis, so I'll have some better insights after that hopefully.

The ndarray redesign (https://github.com/rust-ndarray/ndarray/issues/1272) is also something to watch w.r.t how they handle shapes, types and internal buffer representations (alignment, padding, padding between rows to keep them aligned etc). Being able to finely control memory layouts, doing operations in place where possible, avoiding copying etc. while composing image processing pipelines would really improve upon what OpenCV has to offer here in terms of potential for optimizing locality, fusing kernels, offloading to accelerators and all that.

Curious what others here think, because I think basically reimplementing OpenCV in the age of projects like pytorch is a little bit of a wasted effort. Declarative approaches where operations are completely abstracted away from the actual representation in memory such that optimizations like loop blocking, kernel fusion can be applied over the whole pipeline are the future IMO. I definitely understand if this isn't meant to be a research project tho so excuse my ramblings in that case :D

cospectrum commented 1 month ago

into is no better or worse than mut in performance. For historical reasons, image and imageproc use mut

cospectrum commented 1 month ago

If there is no mut function, then either it is impossible to reuse the buffer, or it is simply not implemented, but will be implemented, as soon as someone has a little time to do it (its not difficult).

tnibler commented 1 month ago

mut functions are in-place though right? That's different than putting the result into a buffer passed in by the caller while keeping the input image unchanged.

cospectrum commented 1 month ago

Reading and writing to the same memory violates the borrow checker, lets not forget that

cospectrum commented 1 month ago

And a function with the mut postfix can be either in place with 1 mutable argument or have a second immutable reference input.

cospectrum commented 1 month ago

Vote for custom allocators in the next rust survey, maybe someone will finally stabilize it.

ripytide commented 1 month ago

I'm not sure I understand your last three comments @cospectrum. Reading and writing to the same memory does not violate the borrow checker?

From what I gather this issue is about the lack of the third ownership variant on image processing functions in this library:

  1. Owned Input, Owned Output: fn (image: Image) -> Image
  2. Combined Mutable Input + Output (In-Place Operation): fn (image: &mut Image)
  3. Immutable Input, Mutable Output fn(input: &Image, output: &mut Image)
  4. Immutable Input, Owned Output: fn(image: &Image) -> Image

Interestingly though, 4. is always computationally equivalently reducible to 3.:

fn op(image: &Image) -> Image {
    let mut output = Image::new(image.width, image.height);
    op(image, &mut output);
    output
}

And similarly 1. is reducible to 2.:

fn op(mut image: Image) -> Image {
    op(&mut image);
    image
}

So we only really need to implement 2. and 3. to unlock all four variants (even generate them with macros somehow).

Our current naming conventions using blur() as an example:

  1. No Convention
  2. blur_mut()
  3. blur_into()
  4. blur().

So maybe we just need to offer variants 2. and 3. if we are interested in being flexible regarding parameter ownership.

Unfortunately, not all image processing operations only operate on one image at a time so ownership variants exponentially grow with the number of arguments, but we can probably just follow the variant 2. and 3. patterns for those too.

cospectrum commented 1 month ago
  1. "ownership variant" variant not always possible from existing implementations, for example, match_histogram_mut(&mut img, &img); doesn't compile and doesn't make any sense.

  2. It's not about ownership variant, I don't really care about mut vs owned convention. Its about memallocs: "buffers could be reused instead of de- and reallocating in a loop"; And most of the time memallocs are spent on intermediate buffers. OpenCV hides these intermediate buffers as well. "4 signature diagram" will not help you with that, _in(allocator) will

ripytide commented 1 month ago

Well if it's about reducing allocations by reusing buffers then variant 2. and 3. are exactly what you'd want as they let you supply your own already allocated re-usable memory buffer.

match_histogram_mut(&mut img, &img) is variant 2, if it were variant 3 then it would look like: match_histogram_into(input: &Image, output: &mut Image, target: &Image).

cospectrum commented 1 month ago

Owned variant is possible only from inplace.

cospectrum commented 1 month ago

This is just an example. We already have a convention to implement _mut functions (in place or with an output buffer). So I don't see how owned brings anything to the table, despite polluting the codebase more.

cospectrum commented 1 month ago

let b = match_histogram(b, &a) looks stupid. We don't want that

ripytide commented 1 month ago

This is just an example. We already have a convention to implement _mut functions (in place or with an output buffer). So I don't see how owned brings anything to the table, despite polluting the codebase more.

Oh I see, yeah I agree it doesn't bring anything to the table, but when I discussed this in #622 theotherphil mentioned that the owned variant (1.) is useful for ergonomic reasons.

I wasn't suggesting we should generate them with macros only that we could since they are easily reducible.

cospectrum commented 1 month ago

Lets just find functions that have only this kind of the signature f (&I) -> I, and will see if it makes sense to add _mut implementation. They should allocate output in a simple way: let out = Image::new(input.width(), input.height())

ripytide commented 1 month ago

If you grep for let mut out = image.clone() there are about 15 places where we forward the implementation to the _mut() variants which is pretty sub-optimal when we could instead use _into() non-inplace variants instead.

cospectrum commented 1 month ago

Do you want to replace func(&I) -> I, func_mut(&mut I) pairs with single func(I) -> I?

cospectrum commented 1 month ago

Do you want to replace func(&I) -> I, func_mut(&I, &mut I) pairs with single func(I) -> I?

It's a question of style. Ask the owner.

ripytide commented 1 month ago

No I think we should replace func(&I) -> I with func(&I, &mut I) which is a matter of performance not style since non-in-place functions can be much faster than in-place versions and so there is no need to use the slower version when we could use the faster version.

I also think we should keep func_mut(&mut I) as it is.

ripytide commented 1 month ago

I also think we should get rid of shallow functions like this:

pub fn open(image: &GrayImage, norm: Norm, k: u8) -> GrayImage {
    let mut out = image.clone();
    open_mut(&mut out, norm, k);
    out
}

Since they are potentially misleading to users since they could think that it is implemented using a non-inplace algorithm when in reality it is just forwarding it's implementation to the inplace implementation.

cospectrum commented 1 month ago

Should we remove match_histogram then? There is match_histogram_mut

cospectrum commented 1 month ago

We should follow image codestyle in my opinion

cospectrum commented 1 month ago

We should follow image codestyle in my opinion

They have imageops::blur(&img, sigma), but no imageops::blur_mut(&mut out, &img, sigma). So the same questions araise for them. imageproc is some kind of extension

ripytide commented 1 month ago

image seems to prioritize non-breaking changes over most other things so I'm dubious to follow any of their conventions blindly. Although they do have some _in() variants like flip_vertical_in().

cospectrum commented 1 month ago

Introducing new function is not breaking change.

cospectrum commented 1 month ago

I would save _in for allocators.