Open niclashoyer opened 9 years ago
Maybe you could buffer the hole Exif payload. Is that feasible? Could also waste some memory.
I don't think that adding Seek
to LoadableMetadata::load()
is feasible because I intended for this library to be used in arbitrary contexts, for example, to read image metadata in a streaming mode from a socket. Seek
requirement is too strong. That said, to overcome the error in load_from_buf()
you can use std::io::Cursor
which implements Seek
.
So, I think that the first approach is better. Alternatively, as @fisch42 suggested, you can load the whole EXIF payload into memory. I think that his kind of buffering is okay, it is unlikely that EXIF metadata would take lots of memory. Then you can use direct offsets in a byte slice.
I've implemented basic Exif support in my fork. The code probably isn't great (first thing I've written in Rust), but it might be a useful starting point.
My first approach for reading the tags was storing references, sorting them, and reading the data in order. I ended up dropping that and buffering the Exif data since using Seek
simplified the code, and I was going to have to track how much data had been read in order to ensure it ends up at the end of the segment when finished parsing.
Another annoying thing about the Exif format is that it can be either big or little endian. To deal with that, I had to add my own read_u16
and read_u32
methods that take the byte order as an argument.
The implementation looks great so far! One thing I noticed while trying to implement it using buffering is, that Exif data really is just TIFF, so conceptually it would be the best if we had a TIFF metadata parser and use that to parse Exif. The biggest problem with TIFF is, that image data can be anywhere in the file (just like Exif values), so unlike Exif the TIFF data can get really large, and buffering is no efficient option there.
Looks great, thanks! If you want, you can submit a pull request.
However, I agree with @niclashoyer in that I want to do things in general way if possible; this means that implementing a TIFF parser is the best option.
Well, it seems I should think of how to integrate the ability to seek inside the image data, while not requiring Seek
implementation for those image formats which don't need it...
@netvl just a thought: if you design optional seeking keep in mind that it may be worth to implement both types (seeking / non-seeking) and let the user of the library decide, e.g. if one wants to use immeta to parse TIFF files sent via network buffering the whole file is really bad and a little more complex implementation is "ok". But if one wants to use immeta to parse TIFF files from harddisk a seeking implementation is the best option, as it is still very fast.
Yes, that's something I was thinking of when I was writing that sentence, thanks!
I've added a read_from_seek<R: BufRead + Seek>()
to LoadableMetadata
and changed read()
to require BufRead
instead of just Read
. This should make EXIF parsing implementation easier.
Hi,
I started to write some basic Exif parsing for JPEGs based on this description.
Basically Exif is encoded TIFF inside the APP1 payload. Unfortunately the format is quite messy. One problem that bothers me is, that for each Exif tag the value is stored at an offset if it is larger than 4 bytes. This offset is relative to the start of the TIFF header (APP1 starting point + fixed offset).
I see two strategies here:
LoadableMetadata::load
method needsSeek
. But that gives an error forload_from_buf
asSeek
is not implemented for&[u8]
.I prefer the second method, because it would make parsing a lot easier and doesn't need any sorting on references, but I don't know if it is possible to fix
load_from_buf
to be compatible.Any ideas?