smarnach / pyexiftool

a Python library to communicate with an instance of Phil Harvey's excellent ExifTool command-line application.
Other
271 stars 112 forks source link

decoding to utf-8 issues #20

Open noah opened 7 years ago

noah commented 7 years ago

I am using your excellent library to extract EXIF from a largish repository of images (100k+). I've encountered an encoding-related issue. Basically exiftool returns a garbage tag value and it breaks the call to decode('utf-8') in execute_json().

If I'm reading it correctly, your code assumes that whatever it reads from exiftool will capable of being decoded to utf-8 (is valid JSON). But this does not seem to always be the case:

% exiftool -s -SerialNumber -charset UTF8 P3090087.JPG
SerialNumber                    : #ທ.L.9.-.<.#K%
% exiftool -s -SerialNumber -charset UTF8 P3090087.JPG > file
% cat -v test.json 
Serial Number                   : M-O;#M-`M-:M-^W.M--M-OM-ILM-i}.9.-M-..M-vM-^PM-=M-#<.M-^QM-dG#M-%K%
% exiftool -j -SerialNumber P3090087.JPG     
[{
  "SourceFile": "P3090087.JPG",
  "SerialNumber": "?;#ທ\u0008???L?}\u001F9\u000B-?\u001E<\u0014??G#?K%"
}]

Per the exiftool author, the fix for this seems to be to add the -b (binary output) flag to the call to Popen. This way base64-encoded strings are returned, which cannot trigger a unicode decoding error. Overall encoding is pretty tricky so I thought I'd post and see if you think this is a bug. If nothing else perhaps this will be useful to someone else with a similar problem. Let me know if you'd like further diagnostics.

smarnach commented 7 years ago

@noah Thanks a lot for the report. This library didn't get the attention it deserves for years now, but I hope to get back to it very soon. At first sight, exiftool -j yielding invalid JSON seems like a bug in ExifTool to me, but I'll have to take a closer look to be sure.

rusq commented 7 years ago

@smarnach I know you know but there are 5 pull requests waiting for your attention.

CTimmerman commented 5 months ago

Here's a test PNG featuring a topless AI girl that works in exiftool but breaks Python wrappers that use text mode: breaks_exif_wrapper.zip