kamadak / exif-rs

Exif parsing library written in pure Rust
BSD 2-Clause "Simplified" License
190 stars 42 forks source link

TIFF parsing performance - read_to_end #42

Closed cameroncros closed 1 month ago

cameroncros commented 1 month ago

When parsing TIFF files, the entire file is read into memory. For RAW/ARW files, this can be a significant read.

I am guessing that changing this is likely a huge change, so before I embark on a possibly pointless endeavor to work around that, is there an underlying reason for doing it that way?

If I can make that change, would it be of value, and likely to be merged?

kamadak commented 1 month ago

I wanted to implement the parser without std::io::Seek and it was easiest to read the raw Exif block into memory. The TIFF structure is itself the Exif structure, so TIFF files are read into the memory entirely.

If you change the parser so that it do not read the entire data into memory, I hope it still works on non-seekable files like network streams. (The current Reader::read_from_container unfortunately requires Seek to parse HEIF, but other formats do not require Seek, and I want to change the HEIF parser to remove Seek requirement entirely.)

cameroncros commented 1 month ago

My plan was to keep your code fairly similar, but pass around a [u8]-like strict that could dynamically load the data as requested. (With the naive hope that the metadata would be a tiny fraction of the overall size, so it should be faster to read from a slow disk). How that fits in with a network stream is unclear, but maybe it won't work. Will see where I get to and report back, you may close this issue if you want to avoid clutter.

cameroncros commented 1 month ago

This is probably a massive change, I dont think this is a viable path forward. Thanks anyway.