rude04 / metadata-extractor

Automatically exported from code.google.com/p/metadata-extractor
0 stars 0 forks source link

Tiff reading loads unneeded bytes into memory #44

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Exif and other forms of supported metadata found in Jpeg files only make up a 
small portion of the file's size.  This is especially true of very large image 
files.

Currently the library loads unneeded segments, such as image segments, into 
memory before proceeding to read Exif data.

One user of the library was working with 1GB+ images and found performance to 
be very bad.

I'd like to have metadata readers register with the file processors before the 
file is parsed.  This way, only the required subset of Jpeg data will be loaded 
into memory.

It may actually make the most sense to have the metadata reading done against 
the stream directly, to avoid loading any data into memory at all (if possible.)

This should amount to very significant performance improvements for some users.

Original issue reported on code.google.com by drewnoakes on 15 May 2012 at 10:47

GoogleCodeExporter commented 9 years ago
This also applies to Tiff files.  TiffMetadataReader reads the whole file into 
a bte[]. Tiff files (such as RAW files) tend on average to be much larger than 
Jpegs too.

Original comment by drewnoakes on 16 May 2012 at 8:35

GoogleCodeExporter commented 9 years ago

Original comment by drewnoakes on 18 May 2012 at 10:35

GoogleCodeExporter commented 9 years ago
Hi,

would fixing / changing this increase extraction speed? I have to extract 
timestamps from 1000 images, especially when they are very large (15MB) this 
takes some minutes...

Will a patch be included in the next release?

THX

-marco

Original comment by marcomoe...@gmail.com on 19 May 2012 at 1:58

GoogleCodeExporter commented 9 years ago
Hi Marco,

Speed improvement is definitely the main motivation.  I'm working on including 
this for the next release, however it'll involve some fairly large changes to 
the way TIFF data is handled.  Specifically I'll move to seekable streams 
rather than byte arrays.

Are you referring to JEPG files or TIFF files?  I expect the largest 
improvement to TIFF files, as currently the whole file is loaded into RAM 
before processing begins.  JPEG processing is a little selective about what it 
loads, though I hope to make some improvements there too.

And yes, this will be for the next release.

Drew.

Original comment by drewnoakes on 19 May 2012 at 3:23

GoogleCodeExporter commented 9 years ago
Am limiting this issue to Tiff files for now, as the processing of Tiff/Jpeg is 
quite different.  A much bigger gain is to be had from changing Tiff file 
handling than Jpeg.

Original comment by drewnoakes on 22 May 2012 at 10:38

GoogleCodeExporter commented 9 years ago
Code for this was committed in r1aae00f3fe64 and will be released in 2.5.2.

Note that to take advantage of this performance gain, you must pass 
ImageMetadataReader a File, as the TIFF file format requires random access.  So 
unfortunately it's not possible to process a one-way TIFF stream without first 
loading it into RAM.  Consequently, I've deprecated the method that does that.

Original comment by drewnoakes on 22 May 2012 at 12:59