Only load part of image in Node

mattiasw / ExifReader

A JavaScript Exif info parser.

Mozilla Public License 2.0

779 stars 89 forks source link

Only load part of image in Node #53

Closed janosh closed 5 years ago

janosh commented 5 years ago

First of all, thanks for this tool! It's great stuff and a pleasure to use!

In the readme's tips, you mention one should only read part of a file for efficiency and point to the examples for how to do this. However, you actually read the entire file in the Node example. Wouldn't it make sense to replace fs.readFile with fs.read there?

mattiasw commented 5 years ago

Thank you for the kind words. :-)

Yeah, you are right. That only seems to be in the browser examples. I will fix that. Thanks for pointing it out.

mattiasw commented 5 years ago

It turns out that after the addition of IPTC and XMP tags it's really hard to guess a maximum size of the meta data part of the file. So I actually ended up removing that tip. :-(

https://stackoverflow.com/questions/3248946/what-is-the-maximum-size-of-jpeg-metadata

andreash commented 5 years ago

I am using exifreader for processing the tags of about 70.000 image files. As of now, I am reading the image files in nodejs using readFileSync, i.e. I am reading the complete file.

Coming back to the question above: would it be possible, to read only a part of the file? How long does that part need to be? Or could the reading be implemented by callbacks of the jpg-parser?

mattiasw commented 5 years ago

Wow, that's a good number. :-) I've only used it for hundreds of files at a time myself so please tell me if you find any more parts than what you just mentioned that could need optimizing.

Back to your question. If you want to read only a part of the file you need to know there is a possibility that you will lose data from IPTC and XMP (and ICC color profiles that will soon be added), and I'm not sure what ExifReader will do if it tries to read beyond the end of the read file (it should handle it, but not sure it's tested that well for all tag types). That being said, try reading 64-128 KB. I think I had 128 KB in the example before I removed it. Tell me if it breaks. :-) Not sure I understand what you mean by implement the reading by callbacks, can you elaborate?

andreash commented 5 years ago

I solved it with an iterative approach. Thankfully, exifreader throws an error, if the tags cannot be read sucessfully. I am using that to increase the buffer size up to a point, where no error occurs. Thats about 50 times faster than reading the whole images (2 sec compare to 122 sec for 2650 images).

schonert commented 2 years ago

I solved it with an iterative approach. Thankfully, exifreader throws an error, if the tags cannot be read sucessfully. I am using that to increase the buffer size up to a point, where no error occurs. Thats about 50 times faster than reading the whole images (2 sec compare to 122 sec for 2650 images).

@andreash - Curious to how is this solution holding up? I Know it's quite some time since - you might have had some good learnings?