image-rs / image

Encoding and decoding images in Rust
Apache License 2.0
4.96k stars 618 forks source link

Writing code generic over pixel type is hard #2358

Open Shnatsel opened 2 weeks ago

Shnatsel commented 2 weeks ago

I've tried to write code generic over pixel type here, but couldn't get the generic bounds to work out because Primitive doesn't provide arithmetic operations.

Another instance of someone trying and failing to write generic code is here.

It seems the API for that could be improved. I don't have a solution in mind right now, but I wanted to write down these instances so that we could at least get an idea of what the issues are.

Shnatsel commented 1 week ago

More input on the missing pieces for generic code can be found here

fintelia commented 1 week ago

So there's a couple different aspects that you might want to abstract over:

With the current trait definitions, if you try to write a method that's generic over any Pixel, Subpixel, and GenericImage then it is likely to be both rather difficult to implement and not very performant.

Shnatsel commented 1 week ago

We have a tracking issue for GenericImage limiting performance due to lack of layout requirements: #2300

I think we should focus on making it easy to write code that covers all variants of DynamicImage first and foremost. Being more generic than that is a nice bonus but should never come at the expense of covering DynamicImage variants harder.

fintelia commented 1 week ago

I think we should focus on making it easy to write code that covers all variants of DynamicImage first and foremost. Being more generic than that is a nice bonus but should never come at the expense of covering DynamicImage variants harder.

I think that's a good principle. To be more specific list out what aspects we need to be generic over, we at a minimum need 1-3 below. Items 4-5 aren't currently tracked in DynamicImage, but both are often relevant if you want to correctly operate on them...

  1. alpha channel: Whether the pixel type has an alpha channel
  2. color component meanings: RGB vs. L (grayscale). We currently don't rule out others like YUV, XYZ, or CMYK.
  3. data type: Whether the individual components are u8, u16, f32, etc. Right now this also isn't an exhaustive list.
  4. transfer function: Linear vs. sRGB. Or newer options like PQ or HLG
  5. color space: Often assumed to use the bt709 color primaries and white point. Other possibilities include P3 and bt2020.

There's a bunch of other dimensions that can also be important, but I think can be left out of scope for now...


Next comes the question of how to be generic over the various dimensions. The current strategy has been to use the traits from num-traits to abstract over (3) and to add required methods on Pixel for any operations that are dependent on (1) or (2). Neither approach has worked very well. Users can't add new required methods on Pixel, and the num-traits are both very verbose to work with and missing some arithmetic operations.

I don't have a fully thought out design on how to improve this, but I think a critical piece may lie in defining the exhaustive set of possibilities for some dimensions so that users can directly dispatch on that decision. As an extreme example, we could remove the #[non_exhaustive] tag on DynamicImage and then algorithms could match on the 10 concrete variants.

A less extreme example would be exposing a helper function that takes an image and calls one of three closures: either a closure taking an &mut [u8], one taking an &mut [u16], or one taking an &mut [f32]. Then the user's algorithm could call any methods defined on those types, rather than just a limited subset provided by some list of trait bounds.

kornelski commented 1 week ago

Fast blur is a great test-case for a generic image/pixel API. Computation-heavy generic code in Rust is usually ugly anyway. Even blur on a basic single channel of bare integers needs a painful amount of trait bounds.

I've technically managed to write a generic one, but it's horrible:

fn blur<P, S, X>(mut img: ImgRefMut<P>, radius: u8) -> Result<(), TryReserveError>
where
    S: Sum<S>
        + std::fmt::Debug
        + std::cmp::PartialEq
        + std::ops::SubAssign
        + std::ops::AddAssign
        + std::ops::Mul<u16, Output = S>
        + std::ops::Add<Output = S>
        + Copy
        + std::default::Default,
    X: Sum<X>
        + std::fmt::Debug
        + std::cmp::PartialEq
        + std::ops::SubAssign
        + std::ops::AddAssign
        + std::ops::Mul<u32, Output = X>
        + std::ops::Add<Output = X>
        + Copy
        + std::default::Default,
    P: ComponentMap<S, u8, u16> + Copy,
    S: ComponentMap<X, u16, u32>,
    X: ComponentMap<P, u32, u8>,
{

called like

blur::<_, Rgba<u16>, Rgba<u32>>(img, radius)?

ComponentMap lets me iterate over all channels. In my implementation I rely on premultiplied alpha for this to be correct, but for uncorrelated alpha or other algorithms it needs to allow handling alpha separately from color components.

kornelski commented 1 week ago

BTW, many image processing algorithms can't be generic over the purpose of the components or even the transfer function. Giving users access to a bunch of numeric pixel values regardless of the format, like &[u16], is not sufficient to do meaningful correctly-looking things with them.

For example, blur should be done in linearized RGB with premultiplied alpha. Everything deviating from that gets worse. Blending in sRGB is a super common mistake that makes green-red edges brown, makes black on white text thicker, but white on black text disappear. Blurring in YCbCr creates a ton of hue and brightness shifts. Blurring with uncorrelated alpha creates completely wrong colors at the borders, either uncovering previously invisible garbage or creating dark halos.

Even a simple operation like converting from a transparent to opaque image by adding a background color needs the background color converted to a matching color space and blended properly.

I haven't found a satisfactory solution for this. I just make pixel formats an explicit enum, and have helper functions that convert the images down to a smaller easier-to-support subset of pixel formats as needed.

kornelski commented 1 week ago

I haven't found cases where I'd want image to abstract away whether it's RGB or YUV, because image processing on YUV makes subsampling look even worse, so any image processing needs RGB, and outputs like JPEG or AVIF even when they support RGB, work better with YUV, so I want these conversions to happen anyway, and be explicit.

For everything I work on, I found it sufficient to just support row-oriented storage with stride. This can also support planar types if necessary (like YUV chroma subsampling) by treating them as three images.

I would like to develop a cross-crate API that allows combining row-based image decoding/processing/encoding similarly to Rust's IntoIterator + Iterator + FromIterator.

Theoretically, in-memory storage could use tiles or even something clever like z-curve for better memory locality of 2D algorithms, but I couldn't find a case where I could justify that complexity. Separable kernels can be transposed, and large non-separable kernels are too expensive anyway. Copying an image once or twice is not expensive any more, so algorithms that really need a lot of locality can work on a copy. For really expensive algorithms, the Rust image layout/API stops being relevant, because the best way to make that faster isn't a smarter Rust type, but moving the computation to the GPU.

Shnatsel commented 1 week ago

alpha channel: Whether the pixel type has an alpha channel

Not in a great shape right now. There's Pixel::map_with_alpha() but no way to separate out the alpha values and discard the rest, so I had to resort to gross hacks.

Pixel layout: Whether entire rows of pixels can be accessed as a slice, etc.

We're going to have to start requiring row-based slice access if we want to have performant algorithms: #2300

Given how much performance depends on cache locality and vectorization, not to mention the math and bounds checks on every pixel access that non-contiguous storage incurs, it's going to be cheaper to convert the image to a contiguous layout and run the operations on it than to keep it non-contiguous.

And I cannot imagine what use case allowing non-contiguous rows would even serve.

As an extreme example, we could remove the #[non_exhaustive] tag on DynamicImage and then algorithms could match on the 10 concrete variants.

Unless you're willing to copy-paste the same code 10 times, you still need some sort of method to abstract over the exact types involved. This involves either generics (like mine here, also with an excessive amount of bounds, and it doesn't even work for floats) or macros.

I think making Primitive require all the arithmetic operations afforded by num_traits would help a lot. That way most of the math would "just work", without you having to spell out the generic bounds for every single operation. It still leaves gaps, but it's clear improvement.

awxkee commented 1 week ago

I haven't found cases where I'd want image to abstract away whether it's RGB or YUV, because image processing on YUV makes subsampling look even worse, so any image processing needs RGB, and outputs like JPEG or AVIF even when they support RGB, work better with YUV, so I want these conversions to happen anyway, and be explicit.

YUV works completely fine when you want to directly operate on something as intensity channel and avoiding chroma for ex. in CLAHE, and this cannot be done in RGB. Even this is worse than CIE LAB or Oklab, YUV conversion is cheap. However, I agree it is definitely better to do it explicit.

awxkee commented 1 week ago

So there's a couple different aspects that you might want to abstract over:

* Being generic over what `Pixel` trait implementation you're dealing with. So being able to abstract over whether there's an alpha channel, how many color channels there are, etc.

* Then there's `Subpixel` which indicates the underlying type of each channel is. It can be `u8`, `u16`, etc.

* And there's the image representation overall. We have _two_ versions here: `I: ImageBuffer<P, C>` guarantees a specific layout while `I: GenericImage<P>` makes no assumption about the pixel layout (and is thus compatible with any user image type).

With the current trait definitions, if you try to write a method that's generic over any Pixel, Subpixel, and GenericImage then it is likely to be both rather difficult to implement and not very performant.

At the moment everything of this is works well by doing so:

fn amazing_f32_processing<const CHANNELS: usize>(f: &mut [f32]) {
    panic!("Here is f32 {} channels", CHANNELS);
}

fn amazing_u8_processing<const CHANNELS: usize>(f: &mut [u8]) {
    panic!("Here is u8 {} channels", CHANNELS);
}

pub trait AwesomeHandler<T> {
    fn handle_awesome<const CHANNELS: usize>(data: &mut [T]);
}

impl AwesomeHandler<f32> for f32 {
    fn handle_awesome<const CHANNELS: usize>(data: &mut [f32]) {
        amazing_f32_processing::<CHANNELS>(data);
    }
}

impl AwesomeHandler<u8> for u8 {
    fn handle_awesome<const CHANNELS: usize>(data: &mut [u8]) {
        amazing_u8_processing::<CHANNELS>(data)
    }   
}

fn make_some_amazing<T: Copy + AwesomeHandler<T>, const CHANNELS: usize>(f: &mut [T]) {
    T::handle_awesome::<CHANNELS>(f)
}

However, this requires a lot of understanding what is going on and tons of boilerplate but this is do not seems to be avoidable.

Often everything if this might be grouped as an example here

/// # Generics
/// `T` - data type
/// `J` - accumulator type
impl<T, J, const COMPONENTS: usize> HorizontalStackBlurPass<T, J, COMPONENTS>
where
    J: Copy
        + 'static
        + FromPrimitive
        + AddAssign<J>
        + Mul<Output = J>
        + Sub<Output = J>
        + AsPrimitive<f32>
        + SubAssign
        + AsPrimitive<T>
        + Default,
    T: Copy + AsPrimitive<J> + FromPrimitive,
    i32: AsPrimitive<J>,
    u32: AsPrimitive<J>,
    f32: AsPrimitive<T> + AsPrimitive<J>,
    usize: AsPrimitive<J>,
{
}

and called like so

impl AwesomeHandler<u8> for u8 {
    fn handle_awesome<const CHANNELS: usize>(data: &mut [u8]) {
        HorizontalStackBlurPass::<u8, f32, CHANNELS>
    }   
}

But at the moment component type is unclear ( or at least I do not understand how to access it to play with it ). Or contains some untrivial trait which one is also is unclear how to work with it.

HeroicKatora commented 1 week ago

Using impls is inherently a bad fit for this purpose. As @kornelski sketched, the most approachable way to generalize is by listing the available and supported representations at some point in an exhaustive manner. The conflict is that the equivalent solution of traits will always require a form of specialization.

I don't think any arithmetic traits are sufficient. When we're seriously considering efficiency then a support for different SIMD implementations will need to happen. The best way to dispatch here is again if we have some exhaustive list, in those case not only exhaustive to the library but also exhaustive to the callee providing the operator that's being mapped. In terms of texel formats that is:

  1. We have a (publicly non-exhaustive) enumeration of supported formats by the library
  2. Some variants can be exhaustively matched
  3. The library maps the internal formats to those exhaustive cases supported by an operator, by conversion if necessary. As long as this mapping is stable, efficiency is a matter of any callee adding specialized implementations for all cases they want to support.

An interface with unconstrained generics such as Pixel, handle_awesome<const CHANNELS: usize> doesn't seem to fit either level. I think we're stuck here due to trying very hard to provide an outer iteration. This makes it almost impossible for the library to have any control in the mechanism of (3.). My idea would be to instead constrain ourselves to Iterator::for_each. Of course, instead of a simple closure handling one type we'd accept a trait argument.

That's in very general terms the approach I took while implementing the conversion paths in image-canvas. Every color conversion can go through linearized sRGB which means there's only one case to write for a simple operator. It's up to the conversion planning to identify pairs of texels for which a more efficient method can be performed and these pairs are enumerable, not general.

awxkee commented 1 week ago

Yes, I'd also vote for exhaustive match in general. Also very attractive it looks to me if intermediate types as Rgba, GrayAlpha will be omitted and clear primitive type slice as &[&[u8]] is available. This will make external processing very easy.

Shnatsel commented 1 week ago

Also very attractive it looks to me if intermediate types as Rgba, GrayAlpha will be omitted and clear primitive type slice as &[&[u8]] is available. This will make external processing very easy.

I think you would need to cast it to &[&[u8]], &[&[u16]] or &[&[uf32]] depending on the underlying data type, otherwise it wouldn't make much sense. I believe you can do this today with Pixel::channels()?

fintelia commented 1 week ago

The ImageBuffer type (and by extension DynamicImage) currently represents pixel data simply as Vec<u8>, Vec<u16>, Vec<f32>, etc. These can be cast to byte slices or &[[T; N]] using bytemuck or similar.

Shnatsel commented 1 week ago

We'll probably want row-contiguous views as well, which allow viewing rows as &[u8]/&[u16]/&[f32] as well as &[Rgba<u8>] and other pixel types. But that's https://github.com/image-rs/image/issues/2300

fintelia commented 1 week ago

To be clear, ImageBuffer has row continuous views already. In fact, it goes even further and gives you a view with all rows packed together into a single slice. That linked issue is about whether the GenericImage trait should also expose contiguous rows of pixels.

awxkee commented 1 week ago

I've lost a track already. If it is expected to accessed by simple match on DynamicImage (what is completely fine to me) and already stores all in one contiguous buffer then why #2300 and https://github.com/image-rs/image/blob/25fbd3f24a1f79035d031ef7f9391b3f7ccffb49/src/imageops/affine.rs#L64-L69 exists. This is quite confising then and purpose of this is unclear. And what purpose to do for example PR with blur into repo if external crate have exactly same access as in crate.

awxkee commented 1 week ago

Even your contributor, @Shnatsel, perhaps didn't know before this time that it is expected to be accessed only by match on DynamicImage

fintelia commented 1 week ago

@awxkee There's a couple different things going on:

awxkee commented 1 week ago

Just because you can match on the 10 different variants of DynamicImage doesn't mean you want to have 10 entirely different implementations of your image processing algorithm

At least it is straightforward and allows you easy adopt traits for each case, or ignore some, if necessary, with a little boilerplate price for that.

Currently DynamicImage is marked #[nonexhaustive] so outside of this crate, you aren't allowed to match on the variants of the enum without having a => { ... } fallback case.

I think this almost always will come with some breaking release, do not think this is real issue with this.

The blur and resize functionality isn't just implemented for DynamicImage but also for ImageBuffer and any other image type that implements GenericImage.

Thanks, I think I understood, at least main part of the issue