`read_raw(&self, data: Vec<u8>)` should take a reference

kamadak / exif-rs

Exif parsing library written in pure Rust

BSD 2-Clause "Simplified" License

190 stars 42 forks source link

`read_raw(&self, data: Vec<u8>)` should take a reference #15

Open flukejones opened 3 years ago

flukejones commented 3 years ago

read_raw(&self, data: Vec<u8>) should be taking a reference, eg: data: &[u8]. Is there any reason not to do so here?

kamadak commented 3 years ago

Exif fields are lazily parsed. Keeping a copy of unparsed data steam in the struct Exif makes it self-contained, and users can handle the struct without being bothered by the lifetime of the buffer.

flukejones commented 3 years ago

I understand that, but it would also be good to have an option to refer to a buffer also. Maybe both can be done?

1e100 commented 2 years ago

Ran into the same issue, and it's fatal in my case: I need to parse EXIF in uploads before saving them to disk, and copying the entire buffer (up to 50MiB in size) just to read some EXIF doesn't seem ideal. This would be a lot more palatable if I could create metadata-only copies.

kamadak commented 2 years ago

There could be two possible interfaces for shared buffers:

// (1) Parses immediately, borrows nothing.
Reader::read_raw_nonlazy(..., data: &[u8]) -> Exif

// (2) Parses lazily using the borrowed buffer.
Reader::read_raw_ref<'a>(..., data: &'a [u8]) -> ExifRef<'a>

If a raw Exif block is big and you want to avoid copying data, (1) will not meet the spirit of your goal, because when 50 MiB buffer is parsed immediately, struct Exif will grow that large. So, I will go with (2) though users need to care about the lifetime of the buffer. Any opinions?

@1e100 BTW, I am curious about your use cases. Are your raw Exif blocks really that large, or are you parsing TIFF data, where TIFF image data and Exif attributes are intermixed and the total size could be 50 MiB?

1e100 commented 2 years ago

I just need a few metadata fields before I save the file that's all. Files can be quite large though. It's not just the exif block - it's the entire uploaded file

kamadak commented 2 years ago

Reader::read_from_container will only keep a copy of the Exif block, so no need to worry about the size if an image is large but not the Exif block. (Except TIFF)

1e100 commented 2 years ago

Thanks for the tip, but unfortunately it could be TIFF as well. Out of curiosity (and for others to find), why is it different?

kamadak commented 2 years ago

The Exif standard re-uses TIFF's tag-value format to store Exif fields. For images files other than TIFF, a small independent Exif block (which is in TIFF format) is embedded in an image file. For TIFF files, Exif fields (in TIFF format) and TIFF image data (of course in TIFF format as well) are merged. Unfortunately, some Exif fields are inherited from TIFF, and the locations of those fields are specified by the TIFF standard, thus metadata cannot be gathered in one place but are scattered in a TIFF file.

Anyway, if we have a TIFF image on memory and want to avoid copying the entire buffer, read_raw_ref could help. I will consider including such an API in the next version.

w-flo commented 1 year ago

Reader::read_raw_ref<'a>(..., data: &'a [u8]) -> ExifRef<'a>

I like this! I have &[u8] raw exif data and just need to extract and store some basic exif metadata from it, then immediately drop the Exif, the &[u8] slice, the buffer backing the slice, etc. The allocation (to_vec() on the slice) is not a big issue I guess, but it's not really needed in my use case. And I guess this is a pretty common use case.