janfri / mini_exiftool

This library is a wrapper for the Exiftool command-line application (https://exiftool.org) written by Phil Harvey. It provides the full power of Exiftool to Ruby: reading and writing of EXIF-data, IPTC-data and XMP-data. Branch master is for actual development and branch compatibility-version is for compatibility with Ruby 1.8 and exiftool versions prior 7.65.
GNU Lesser General Public License v2.1
213 stars 52 forks source link

The first call .to_hash is very slow #26

Closed Joshfindit closed 8 years ago

Joshfindit commented 8 years ago

On two separate machines with different a OS on each (OSX and Ubuntu);

Calling twenty_mini = MiniExiftool.new '27.jpg' is quick Calling twenty_mini.to_hash is very slow. 2-3 seconds.

During testing, I found that it's only the first call to to_hash. Calling to_hash on separate image files is quick as expected.

janfri commented 8 years ago

This is a performance feature. :-) mini_exiftool uses a pstore file in $HOME/.mini_exiftool to store the original tag names (CamelCase) for your version of ExifTool. The very first time you need this data (e.g. when call to_hash) it is generated. All further uses of this data uses this file. So if you don't delete the pstore file all future runs of your code should be quick.

Joshfindit commented 8 years ago

Ahh, that makes sense. Concern gone. :)

Thanks for the heads-up. I don't know if it bugged anyone else, but if I can offer a suggestion from a user-perspective In return; have it say something like 'generating exiftool cache file' when it hits that code.

On Jun 2, 2016, at 4:25 AM, janfri notifications@github.com wrote:

This is a performance feature. :-) mini_exiftool uses a pstore file in $HOME/.mini_exiftool to store the original tag names (CamelCase) for your version of ExifTool. The very first time you need this data (e.g. when call to_hash) it is generated. All further uses of this data uses this file. So if you don't delete the pstore file all code should be run quick.

― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

janfri commented 8 years ago

MiniExiftool uses a pstore file for caching tag names since 2007 (commit https://github.com/janfri/mini_exiftool/commit/57c6071fcb6a343a4848a4aa6ea73570bb677711) and nobody has had any concerns with it. Maybe nobody has noticed it. ;-)

I don't like the idea to pollute $stderr with such messages particulary when running the test suite. Any other ideas?

Joshfindit commented 8 years ago

I wouldn't put it in $stderr, $stdout is more than enough.

janfri commented 8 years ago

For me $stdout seems to be even more improper.

Joshfindit commented 8 years ago

Maybe I'm communicating the wrong thing when I say $stdout. The desired effect would be something like this:

irb(main):002:0> require 'mini_exiftool'
=> true
irb(main):003:0> twenty_mini = MiniExiftool.new '27.jpg'
=> #<MiniExiftool:0x0055a330115ac0 @opts={:numerical=>false, :composite=>true, :fast=>false, :fast2=>false, :ignore_minor_errors=>false, :replace_invalid_chars=>false, :timestamps=>Time}, @filename="27.jpg", @io=nil, ... ,\n  \"LightValue\": 10.5\n}]\n", @error_text="">

irb(main):004:0> twenty_mini.to_hash
Generating PStore cache...
Done.
=> {"ExifToolVersion"=>10.1, "FileSize"=>"7.0 MB", "FileModifyDate"=>2016-06-13 10:21:38 -0400, "FileAccessDate"=>2016-06-13 10:33:28 -0400, ... , "LightValue"=>10.5}
irb(main):005:0>

Run unattended, this would be shown in in the session or in the standard logs (Such as Rails.root/log/development.log)

Joshfindit commented 8 years ago

Keeping in mind, that I understand that this was a minor head-scratcher, and it's probably not noticed by many people - it was an edge case that caused me to steer away from mini_exiftool (Ran it interactively on two separate machines, and I silently switched to the exiftool gem because I was concerned that mini_exiftool had performance problems. Ironically, it was the tag naming that caused me to switch back, but only when I decided that the naming I liked better was worth the performance penalty.

I wanted to prevent people from having the same misconception, even if there's a small chance.

janfri commented 8 years ago

You've convinced me... ;-)

I use $stderr to not to pollute $stdout. So it isn't a problem in scripts wich uses $stdout as data output. See mini_exiftool version 2.7.4.

Btw. If performance is an issue for you: Have a look at my other gem https://github.com/janfri/multi_exiftool. It is optimized for performance especially when working with more than one file.