image-rs / image

Encoding and decoding images in Rust
Apache License 2.0
4.86k stars 597 forks source link

New DDS decoder #2258

Open RunDevelopment opened 3 months ago

RunDevelopment commented 3 months ago

I made a new DDS decoder, because the old DXT-based decoder was very limited (only DXT1-5 + dimensions divisible by 4) and incorrect (DXT1 colors were not rounded correctly, resulting in discolorations). While this PR is not finished yet, I already implemented the following features:

The only main formats that are still missing are BC6 and BC7. These 2 are quite complex, so it will likely take me some more time to implement them. Once those 2 are implemented, this should be a pretty competent DDS decoder.

Some notes and technical decisions:


With what I did so far out of the way, I have some questions to the maintainers on how to integrate this into the image crate:

  1. How should I test this? Used around 20MB of images (>100 files) and tests/reference_image.rs > render_images to ensure that the decoder would correctly, but 20MB is a lot of image data to commit to a repo. I can go down to around 8MB, but not much lower. The issue is that I only have one file for some rare formats and no way to generate smaller images of those formats.
  2. Benchmarking. I have not benchmarked decoding DDS files yet (although it shouldn't be too slow). The current bench setup also seems to be ill-suited for DDS, as I need to benchmark each format individually. How should I go about this?
  3. Should this in a separate crate? I initially wanted to make this a separate crate, but found it vastly easier to add it to the image crate
  4. What should we do with the old DXT implementation? It's not used by the new DDS decoder, and it's deprecated for some time now. Maybe it could be removed after the new DDS decoder is in?
  5. How careful do I have to be with resource limits? Most formats currently read line-by-line into a temporary buffer from which pixels are then decoded. The size of this buffer is proportional to the width of the image. This means that we only need O(sqrt(N)) (N=number of pixels) additional memory for roughly square images. However, an attacker would supply an image with height=1, causing the temporary buffer to as large as (but no larger than) the output buffer. I already limited the width and height of DDS images to be at most 224, so the temporary buffer can be at most 256MB (with any R32G32B32A32_* format).

Also, (not a question) this my first large-scale Rust PR, so please feel free to pick apart and suggest improvements to everything you see.

RunDevelopment commented 3 months ago

Thanks for the quick response!

Also, for bc decoding there's already a crate which looks interesting and doesn't even have dependencies: bcdec_rs. As far as is possible, I'd see if the bc.rs and convert.rs contents and parts of decoder.rs might be upstreamed to that crate and then used via a dependency. It's important to reduce the amount of code to be maintained centrally. (Aside: they, as in the author, also seem to have a file/encoding/decoding crate about it, image_dds, so might be interested anyways).

bcdec_rs is certainly interesting. Especially since they already implemented BC6/7 (although their code scares me). However, they too seem to get rounding slightly wrong from looking at the code. I'll have to do some testing to be sure though. Assuming it's all correct (or that we fixed any issues), bcdec_rs should be able to replace bc.rs completely.

However, I don't think that the things in decoder.rs fit the theme of bcdec_rs, so I don't think moving them into that crate would fit. Of course, it's up to the author of bcdec_rs to decide what does and doesn't fit.

On the topic of image_dds: I might also add an encoder for uncompressed DDS files (= everything except DXTn/BCn) in a future PR. Block compression encoding might be a bit too difficult for me to get something good working, but all other formats are uncompressed and almost trivial to encode. I'm saying this now, because it might change your decision on whether this all should be a separate crate.

I also want to mention that I don't intend to maintain this code, so if you want to make it a separate crate, I would ask to put it under the image-rs org. I don't have any experience with maintaining a crate and I don't do a lot of Rust in general.

How should I test this?

8 MB isn't that much if the files don't change. That is, are they free of copyright risks etc. This would be my primary concern. Still I'm wondering why is it that the test images are this large?

Copyright won't be a concern for most images. I tested most formats with a 257x131 test image I created myself, and then used texconv to create DDS files in different formats. This is the image btw:

image

However, there are some images for which copyright will be an issue. For some less-common formats, I only have one image per format that I found in some games (e.g. Elden Ring and the Dark Souls games). Given that all of these formats are uncompressed (no fancy block compression), I could write some scripts to generate images in those formats. Non-standard flags could also be tested by just hex-editing the headers of similar DDS files.

As for the size: most DDS formats are uncompressed, so even my little test image produces a 174KB DDS file when saved with 8bpc RGBA color. Take that times 100 files and that's why. My main way of reducing the size would be use an even smaller test image btw.

Doing this with f32 is quite slow, so I did with only integer operations that are around 3x faster than f32 and around 2x harder to understand.

I'm not sure all the tricks are worth the infamiliarty burden. Did you benchmark them? End-To-End numbers would be most convincing here, some floating point operations are quite fast, unless integer versions are smaller and can be vectorized better.

I did a quick benchmark for a DX10 B5G6R5_UNORM image. (This format basically just does a range conversion (the same one BC1 uses) and that's it, so it's the ideal format to test this with.) I also found that the bcdec_rs crate uses a different trick for the range conversion. I don't understand how they did it, but they found a version that uses a bit shift instead of a division at the end. And it's crazy fast:

my int trick
  time:   [84.337 µs 84.543 µs 84.747 µs]

f32
  time:   [112.58 µs 112.93 µs 113.36 µs]
  change: [+32.418% +33.158% +33.792%] (p = 0.00 < 0.05)

bcdec_rs int trick
  time:   [57.821 µs 58.017 µs 58.255 µs]
  change: [-31.977% -31.377% -30.646%] (p = 0.00 < 0.05)
For the f32 conversion I used this snippet (similar for `x6_to_x8`): ```rs #[inline(always)] pub(crate) fn x5_to_x8(x: u16) -> u8 { debug_assert!(x < 32); const FACTOR: f32 = 255.0 / 31.0; (x as f32 * FACTOR + 0.5) as u8 } ``` So I already optimized to multiply with a single constant and use the truncation `as u8` performs for rounding by adding 0.5. Especially the truncation trick is a *lot* faster than the call to round, which compiles to a literal `call` instruction.

So I think I'm going to switch to the trick bcdec_rs uses :)

And I think I also find constants for the same trick to use in other range conversion function. I don't know how they derived their constants, but they are small enough that I can use brute force to find more.

RunDevelopment commented 2 months ago

Alright, so it's pretty much ready now.

Updates:

There are just a few things we have to talk about:

  1. bcdec_rs: As I suspected, bcdec_rs does not round correctly. This affects all block compression formats except for BC6 and BC7. This isn't even a bug with bcdec_rs directly, because it inherited it from the bcdec C library.

    Unfortunately, both the author of bcdec_rs and bcdec do not seem to be convinced that this is a bug or that it is worth fixing. I'll talk to them more, but that this means that we can't use bcdec_rs for BC1-5 for the time being.

    bcdec_rs also doesn't support signed BC4 and BC5.

  2. Clippy: DxgiFormat and PixelFormatFlags has a few unused associated constants. These constants are there for completeness, so removing them isn't really an option, because I'm not too keen on adding in DXGI_FORMAT constants on demand.

    I could slap #[allow(unused)] on all these constants, but that doesn't seem like a nice solution. Is there a better way to fix this?

RunDevelopment commented 2 months ago

I've also run into a problem with testing: How do I test RGB32F images? render_images doesn't output them.

RunDevelopment commented 1 month ago

Sorry for the delay!

I mostly finished up the PR now.

  1. I fixed all clippy problems.
  2. I tried using the bcdec_rs crate for BC decompression as much as possible, but its BC4 implementation is incorrect, so I don't use it. I've made a PR to fix this upstream in bcdec, so hopefully this won't be an issue for much longer.
  3. I added tests for all supported formats that I could test.

However, I need help with testing the floating-point formats. I don't know how to generate reference images for them, since render_images in reference_images.rs doesn't support f32 images. @HeroicKatora, could you please help me here?

Otherwise, this PR is ready as I see it. The new DDS decoder is correct, decently performant, and covers most DDS formats.

@HeroicKatora Please let me know what else needs to be done/changed.