image-rs / image-tiff

TIFF decoding and encoding library in pure Rust
MIT License
117 stars 75 forks source link

CCITT group 4 (Fax4) decoding support #229

Open stephenjudkins opened 6 months ago

stephenjudkins commented 6 months ago

This adds support for decoding CCIT group 4 tiff images by using the fax crate.

I've found that the photometric interpretation tag is ignored by all the extant decoders I've found, and attempting to correctly interpret it will break some subset of images. Unfortunate, but I suspect it's best to follow the herd here?

s3bk commented 6 months ago

Probably check Tag::PhotometricInterpretation to decide what values are black and white.

stephenjudkins commented 6 months ago

OK. I've done some research and found that

  1. other image processors (imagemagick, Apple Preview) ignore the photometric interpretation tag and always assume that it's WhiteIsZero
  2. fax4-encoded images I've found in the wild (which are all, unfortunately, confidential images of business checks) have the tag set correctly WhiteIsZero
  3. imagemagick creates fax4-encoded tiffs with photometric interpretation incorrectly as BlackIsZero but that doesn't matter because everything that loads the files assumes WhiteIsZero.

To conform with how everything in the wild I'd argue we just keep this hardcoded the way it is?

s3bk commented 6 months ago

hahaha. Yea, I think just keep it as is.

fintelia commented 6 months ago

Ideally, I'd like to see the create_reader act more lazily. The best way would be to have incremental decoding, but the fax crate doesn't seem to support that. An intermediate step might be to generate the list of color transitions, and then lazily expand that into the provided output buffer

s3bk commented 6 months ago

I don't see a reason why incremental decoding could not be implemented.

It would be pretty easy to write something like

impl Decoder<R: Read> {
  fn new(reader: R) -> Self;
  fn advance(&mut self) -> Result<(), io::Error>;
  fn pels(&self) -> Pels;
}
fintelia commented 6 months ago

Yeah, I think an interface like that would work. Then it could be wrapped in another object that implemented Read by tracking how far into the row had been read and expanded Pels pixel by pixel

stephenjudkins commented 6 months ago

Appreciate the feedback and all the back-and-forth here. I'm happy to help out as much as I can here to push this forward, but I don't want to step on @s3bk's feet if he's working on these changes.

s3bk commented 6 months ago

@stephenjudkins I pushed changes to the fax repo. There are strange differences with the last line again. I need to re-encode my samples with libtiff to check if the error is in the sample data or my code.

But you should be able to use the code to adapt this PR to the next version.