janfri / mini_exiftool

This library is a wrapper for the Exiftool command-line application (https://exiftool.org) written by Phil Harvey. It provides the full power of Exiftool to Ruby: reading and writing of EXIF-data, IPTC-data and XMP-data. Branch master is for actual development and branch compatibility-version is for compatibility with Ruby 1.8 and exiftool versions prior 7.65.
GNU Lesser General Public License v2.1
213 stars 52 forks source link

ignore non-utf8 characters in exiftool output. #8

Closed shioyama closed 11 years ago

shioyama commented 11 years ago

I hit a snag with a file that had a badly-encoded string in its exif data, so I wrote this patch with a test.

shioyama commented 11 years ago

Hmm... tests passed locally on 1.9.3p327, but some issues with earlier versions I guess.

robotmay commented 11 years ago

I've also just hit this problem, if you needed a +1 in here. It's weird; I'm not sure how a non UTF-8 string got in there in the first place.

shioyama commented 11 years ago

@robotmay would you mind trying the patch and see if it works? travis is failing on 1.9.3p327 but on my system it works fine, wondering if it has to do with the default encoding of exiftool or something. It does't work on 1.8.7 either but that's because there is no valid_encoding? method on the String class.

p.s. I also don't understand how the non UTF-8 string got in there...

robotmay commented 11 years ago

@shioyama Aye I'll give that a go this evening; I'm currently running Ruby 2.0.0 so it'll be interesting to see what platforms it does work on.

janfri commented 11 years ago

Travis fails at the moment, because their exiftool version is too old. Your solution isn't Ruby 1.8 compatible which I want support. Further I don't understand why you first encode to UTF-16le and than back to utf-8?

janfri commented 11 years ago

So travis works again on my master's HEAD.

shioyama commented 11 years ago

@janfri Regarding conversion to UTF-16le, I was following the advice here: http://stackoverflow.com/questions/8710444/is-there-a-way-in-ruby-1-9-to-remove-invalid-byte-sequences-from-strings

But apparently both encodings include all characters, so the conversion isn't necessary I suppose. I can update this later, but mainly I just wanted to post the test to identify the problem. There's at least two of us who have encountered it.

janfri commented 11 years ago

@shioyama, @robotmay Are you running exiftool on a windows system?

shioyama commented 11 years ago

Nope, Ubuntu linux. But I wasn't the one who created the image itself -- that person was probably using Windows.

robotmay commented 11 years ago

Gah, my apologies; I haven't gotten around to testing this yet. Should have a chance to this weekend, however.

Running it on Ubuntu 12.04/13.04 (upgraded a few days ago) and Heroku with the same results. The images I've noticed it happening on so far are from a Nexus S and some scanned 35mm film negatives.

robotmay commented 11 years ago

Ooh, and here's an image that I know doesn't work; it has odd characters in the comment field, I believe. Taken on a Nexus S.

https://www.dropbox.com/s/iv2to5g55v8bzky/2012-05-05%2015.11.54.jpg

shioyama commented 11 years ago

@janfri a couple points:

first I tested without the pre-conversion to UTF-16, and the test I added fails with an ArgumentError: invalid byte sequence in UTF-8 error -- it would seem that converting to UTF-16 first does indeed get around that problem.

Also, about the failing coordinates test, I took out the added line and ran the test on my desktop and got the same error (for the existing code). I'm running exiftool version 8.60. If we can nail down why that test is failing, then the next step would be to find a way to do this that is compatible with ruby 1.8.7.

janfri commented 11 years ago

@shioyama The failing coordinate test is a change in the exiftool output, with a newer version (9.27 for example) it pass. I've changed the travis.yml in my master and travis works fine.

janfri commented 11 years ago

@robotmay @shioyama I'm not shure if MiniExiftool should handle such things, maybe exiftool itself should be able to handle such curiosities because this will solve the problem not only for the Ruby world.

shioyama commented 11 years ago

@janfri but exiftool does not crash on non-UTF8 characters, MiniExiftool does. It's fine if one tag value is garbled, but it shouldn't bring down the program with it.

janfri commented 11 years ago

Since Ruby 1.8 support is official ending (We will no longer support 1.8.7 in all senses after June 2013.) I'm thinking about a cut in MiniExiftool development for Ruby 1.8 in the near future. So if it's not so urgent for you you can maybe wait and don't need to implement a Ruby 1.8 conform solution.

shioyama commented 11 years ago

Sure, no big hurry. I'm using the fork now and it's working fine.

rathgar commented 11 years ago

We're experiencing the same weird byte sequence errors on v2.0.0.

Things have moved about a bit in 2.0.0 so I have applied @shioyama's edit to that branch by inserting it here instead:

diff --git a/lib/mini_exiftool.rb b/lib/mini_exiftool.rb
index 8f47792..70ce9d8 100644
--- a/lib/mini_exiftool.rb
+++ b/lib/mini_exiftool.rb
@@ -331,6 +331,7 @@ class MiniExiftool

   def perform_conversions(value)
     return value unless value.kind_of?(String)
+    value.encode!('UTF-16le', invalid: :replace, replace: '').encode!('UTF-8') unless value.valid_encoding?
     case value
     when /^\d{4}:\d\d:\d\d \d\d:\d\d:\d\d/
       s = value.sub(/^(\d+):(\d+):/, '\1-\2-')

which works a treat. Anyone else?

janfri commented 11 years ago

Sorry I'm not yet finished. Please have patience. ;-)

janfri commented 11 years ago

@rathgar please test against the test of @shioyama ! It fails for me, because it seems JSON replaces non UTF-8 characters with a question mark!

rathgar commented 11 years ago

All test pass except for test_access_coordinates(TestReadCoordinates)

<43.653167> expected but was
<"+43.653167 N">

but somehow, I feel I should expect that. By default the -c option is nil in exiftool but mini_exiftool sets it to '' for which exiftool returns things like 79 deg 22' 23.40" W which might not be expected. Perhaps mini_exiftool should only set -c if the opts[:coord_format] is set? @janfri will correct if I speak out of turn, here.

janfri commented 11 years ago

test_access_coordinates tests the -c option. But there is a problem of the exiftool version (it returns different results in different versions) with newer exiftool versions the test passes. So please ignore this at the moment! I will fix it soon.

In my environment the new test TestBadOutputEncoding in the branch shioyama:bad_output_encoding_fix of @shioyama fails. Do this pass in your environment?

rathgar commented 11 years ago

No, that test fails for me also. (exiftool version: 8.60)

However, running the exiftool command by hand also returns the same value.

$ exiftool -GPSLatitude -j -q -q -s -c '%+.6f' test/data/test_coordinates.jpg 
[{
  "SourceFile": "test/data/test_coordinates.jpg",
  "GPSLatitude": "+43.653167 N"
}]

It seems to contradict what the exiftool docs say should be returned:

2) If the hemisphere is known, a reference direction (N, S, E or W) is appended to each printed coordinate, but adding a + to the format specifier (ie. %+.6f) prints a signed coordinate instead.

janfri commented 11 years ago

@shioyama @robotmay @rathgar Are there still problems with invalid characters when using the current masters HEAD of my repository? If so please share example files. Otherwise I will close this issue and release a new version of mini_exiftool.

All tests passing now even with older exiftool versions and even on windows! :-)

shioyama commented 11 years ago

Will try new version asap and report back soon.

rathgar commented 11 years ago

Against b3816fc7829298b2194f5ba021488e278c63add7 (current master) I still get errors when processing some files.

backtrace:

ArgumentError: invalid byte sequence in UTF-8
    from /.../janfri/mini_exiftool/lib/mini_exiftool.rb:336:in `==='
    from /.../janfri/mini_exiftool/lib/mini_exiftool.rb:336:in `perform_conversions'
    from /.../janfri/mini_exiftool/lib/mini_exiftool.rb:328:in `block in parse_output'
    from /.../janfri/mini_exiftool/lib/mini_exiftool.rb:327:in `each'
    from /.../janfri/mini_exiftool/lib/mini_exiftool.rb:327:in `parse_output'
    from /.../janfri/mini_exiftool/lib/mini_exiftool.rb:98:in `load'
    from /.../janfri/mini_exiftool/lib/mini_exiftool.rb:68:in `initialize'

Not sure if GH strips out meta but here's an example file: utf8fail

janfri commented 11 years ago

@rathgar I don't get errors. Could you give me the following informations: Ruby version, JSON::VERSION, exiftool version, OS Version?

My parsed results in YAML (MiniExiftool.new('dba713d8-cf64-11e2-9b31-7586d7f9220b.jpg').to_yaml):

SourceFile: dba713d8-cf64-11e2-9b31-7586d7f9220b.jpg
ExifToolVersion: 9.3
FileName: dba713d8-cf64-11e2-9b31-7586d7f9220b.jpg
Directory: .
FileSize: 24 kB
FileModifyDate: 2013-06-07 15:17:25.000000000 +02:00
FileAccessDate: 2013-06-07 15:21:29.000000000 +02:00
FileInodeChangeDate: 2013-06-07 15:17:25.000000000 +02:00
FilePermissions: rw-r--r--
FileType: JPEG
MIMEType: image/jpeg
JFIFVersion: 1.01
ExifByteOrder: Big-endian (Motorola, MM)
Make: NIKON CORPORATION
Model: NIKON D300S
Orientation: Horizontal (normal)
XResolution: 300
YResolution: 300
ResolutionUnit: inches
Software: Ver.1.01
ModifyDate: 2013-04-22 12:49:29.000000000 +02:00
Artist: ''
YCbCrPositioning: Centered
Copyright: ''
ExposureTime: !ruby/object:Rational
  denominator: 80
  numerator: 1
FNumber: 20.0
ExposureProgram: Manual
ISO: 200
ExifVersion: '0221'
DateTimeOriginal: 2013-04-22 12:49:29.000000000 +02:00
CreateDate: 2013-04-22 12:49:29.000000000 +02:00
ComponentsConfiguration: Y, Cb, Cr, -
ExposureCompensation: +2/3
MaxApertureValue: 4.9
MeteringMode: Multi-segment
LightSource: Cloudy
FocalLength: 46.0 mm
MakerNoteVersion: 2.1
Quality: Fine
WhiteBalance: Cloudy
FocusMode: AF-C
FlashSetting: ''
FlashType: ''
WhiteBalanceFineTune: 0 0
WB_RBLevels: 1.6640625 1.1875 1 1
ProgramShift: 0
ExposureDifference: -0.8
Compression: JPEG (old-style)
PreviewImageStart: 9056
PreviewImageLength: 0
FlashExposureComp: 0
ISOSetting: 200
ImageBoundary: 0 0 4288 2848
ExternalFlashExposureComp: 0
FlashExposureBracketValue: 0.0
ExposureBracketValue: 0
CropHiSpeed: Off (4352x2868 cropped to 4352x2868 at pixel 0,0)
ExposureTuning: 0
VRInfoVersion: 100
VibrationReduction: 'On'
VRMode: Normal
ImageAuthentication: 'Off'
ActiveD-Lighting: Normal
PictureControlVersion: 100
PictureControlName: Standard
PictureControlBase: Standard
PictureControlAdjust: Full Control
PictureControlQuickAdjust: Normal
Brightness: Normal
HueAdjustment: None
FilterEffect: n/a
ToningEffect: n/a
ToningSaturation: n/a
Timezone: '+00:00'
DaylightSavings: 'No'
DateDisplayFormat: D/M/Y
ISOExpansion: 'Off'
ISOExpansion2: 'Off'
LensType: G VR
FlashMode: Did Not Fire
ShootingMode: Continuous
ShotInfoVersion: '0216'
FirmwareVersion: 1.01b
ISO2: 200
CustomSettingsBank: A
CustomSettingsAllDefault: 'No'
AF-CPrioritySelection: Focus
AF-SPrioritySelection: Focus
AFPointSelection: 51 Points
DynamicAFArea: 21 Points
FocusTrackingLockOn: Normal
AFActivation: Shutter/AF-On
FocusPointWrap: No Wrap
AFPointIllumination: Auto
AFAssist: 'On'
AF-OnForMB-D10: AF-On
ISOStepSize: 1/3 EV
ExposureControlStepSize: 1/3 EV
ExposureCompStepSize: 1/3 EV
EasyExposureCompensation: 'Off'
CenterWeightedAreaSize: 8 mm
FineTuneOptCenterWeighted: 0
FineTuneOptMatrixMetering: 0
FineTuneOptSpotMetering: 0
MultiSelectorShootMode: Select Center Focus Point
MultiSelectorPlaybackMode: Thumbnail On/Off
InitialZoomSetting: Low Magnification
MultiSelector: Do Nothing
ExposureDelayMode: 'Off'
CLModeShootingSpeed: 5 fps
MaxContinuousRelease: 100
ReverseIndicators: + 0 -
FileNumberSequence: 'On'
BatteryOrder: MB-D10 First
MB-D10Batteries: LR6 (AA alkaline)
ScreenTips: 'On'
Beep: 'Off'
ShootingInfoDisplay: Auto
GridDisplay: 'On'
ViewfinderWarning: 'On'
FuncButton: None
FuncButtonPlusDials: Auto Bracketing
PreviewButton: Preview
PreviewButtonPlusDials: None
AELockButton: AE/AF Lock
AELockButtonPlusDials: None
CommandDialsReverseRotation: 'No'
CommandDialsChangeMainSub: 'Off'
CommandDialsApertureSetting: Sub-command Dial
CommandDialsMenuAndPlayback: 'Off'
LCDIllumination: 'Off'
PhotoInfoPlayback: Info Up-down, Playback Left-right
ShutterReleaseButtonAE-L: 'Off'
ReleaseButtonToUseDial: 'No'
SelfTimerTime: 10 s
MonitorOffTime: 10 s
FlashSyncSpeed: 1/250 s
FlashShutterSpeed: 1/60 s
AutoBracketSet: AE & Flash
AutoBracketModeM: Flash/Speed
AutoBracketOrder: 0,-,+
ModelingFlash: 'On'
NoMemoryCard: Enable Release
MeteringTime: 6 s
InternalFlash: Commander Mode
NoiseReduction: 'Off'
WB_GRBGLevels: 256 426 304 256
LensDataVersion: 204
ExitPupilPosition: 107.8 mm
AFAperture: 5.0
FocusPosition: '0x11'
FocusDistance: 0.38 m
LensIDNumber: 153
LensFStops: 5.33
MinFocalLength: 16.3 mm
MaxFocalLength: 84.8 mm
MaxApertureAtMinFocal: 3.6
MaxApertureAtMaxFocal: 5.7
MCUVersion: 155
EffectiveMaxAperture: 5.0
RetouchHistory: None
ImageDataSize: 6753175
ShutterCount: 105849
FlashInfoVersion: 103
FlashSource: None
ExternalFlashFirmware: n/a
ExternalFlashFlags: (none)
FlashCommanderMode: 'Off'
FlashControlMode: 'Off'
FlashGNDistance: 0
FlashColorFilter: None
FlashGroupAControlMode: 'Off'
FlashGroupBControlMode: 'Off'
FlashGroupCControlMode: 'Off'
FlashGroupACompensation: 0
FlashGroupBCompensation: 0
FlashGroupCCompensation: 0
MultiExposureVersion: 100
MultiExposureMode: 'Off'
MultiExposureShots: 0
MultiExposureAutoGain: 'Off'
HighISONoiseReduction: 'Off'
PowerUpTime: 2013-04-08 10:33:36.000000000 +02:00
AFInfo2Version: 100
ContrastDetectAF: 'Off'
AFAreaMode: Dynamic Area (21 points)
PhaseDetectAF: On (51-point)
PrimaryAFPoint: E6
AFPointsUsed: E6
ContrastDetectAFInFocus: 'No'
FileInfoVersion: 100
DirectoryNumber: 114
FileNumber: 5474
AFFineTune: On (1)
AFFineTuneIndex: n/a
AFFineTuneAdj: 0
UserComment: ''
SubSecTime: 43
SubSecTimeOriginal: 43
SubSecTimeDigitized: 43
FlashpixVersion: 100
ColorSpace: sRGB
ExifImageWidth: 4288
ExifImageHeight: 2848
SensingMethod: One-chip color area
FileSource: Digital Camera
SceneType: Directly photographed
CFAPattern: '[Red,Green][Green,Blue]'
CustomRendered: Normal
ExposureMode: Manual
DigitalZoomRatio: 1
FocalLengthIn35mmFormat: 69 mm
SceneCaptureType: Standard
GainControl: None
Contrast: Normal
Saturation: Normal
Sharpness: Hard
SubjectDistanceRange: Unknown
SerialNumber: 6086161
GPSVersionID: 2.2.0.0
ThumbnailOffset: 9200
ThumbnailLength: 9922
CurrentIPTCDigest: bea898bb42c3793dd8d8c81153ea6a77
ApplicationRecordVersion: 3
ObjectName: 2008
DateCreated: '2013:04:22'
TimeCreated: 12:49:29+00:00
Caption-Abstract: "June 2013\rPeugeot 2008\rJuly 2013\rPeugeot 2008\r"
Prefs: Tagged:1, ColorClass:0, Rating:0, FrameNum:005474
CopyrightFlag: false
XMPToolkit: Image::ExifTool 8.60
Tagged: true
FlashCompensation: 0
Lens: 16-85mm f/3.5-5.6
Description: |
  June 2013
  Peugeot 2008
  July 2013
  Peugeot 2008
Title: 2008
CompressedBitsPerPixel: 4
FlashFired: false
FlashFunction: false
FlashRedEyeMode: false
FlashReturn: No return detection
Rating: 0
CropBottom: 2848
CropLeft: 0
CropRight: 4288
CropTop: 0
ImageWidth: 100
ImageHeight: 68
EncodingProcess: Baseline DCT, Huffman coding
BitsPerSample: 8
ColorComponents: 3
YCbCrSubSampling: YCbCr4:2:0 (2 2)
Aperture: 20.0
AutoFocus: 'On'
BlueBalance: 1.1875
DateTimeCreated: 2013-04-22 14:49:29.000000000 +02:00
Flash: No Flash
ImageSize: 100x68
LensID: AF-S DX VR Zoom-Nikkor 16-85mm f/3.5-5.6G ED
LensSpec: 16-85mm f/3.5-5.6 G VR
RedBalance: 1.664063
ScaleFactor35efl: 1.5
ShutterSpeed: !ruby/object:Rational
  denominator: 80
  numerator: 1
SubSecCreateDate: 2013-04-22 12:49:29.430000000 +02:00
SubSecDateTimeOriginal: 2013-04-22 12:49:29.430000000 +02:00
SubSecModifyDate: 2013-04-22 12:49:29.430000000 +02:00
ThumbnailImage: (Binary data 9922 bytes)
CircleOfConfusion: 0.020 mm
DOF: 0.05 m (0.35 - 0.40)
FOV: 25.8 deg (0.17 m)
FocalLength35efl: '46.0 mm (35 mm equivalent: 69.0 mm)'
HyperfocalDistance: 5.28 m
LightValue: 14.0
rathgar commented 11 years ago
> MiniExiftool::VERSION
 => "2.1.0" 
> JSON::VERSION
 => "1.8.0"
$ exiftool -ver
8.60
$ ruby -v
ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-linux]
$ lsb_release -d
Description:    Ubuntu 12.04.2 LTS

Running it with no options, as you did, it does run fine. Adding each option back in one-by-one it seems that with numerical: true, even on its own is when it then fails. I can't see anything within MET source that would cause this so could it just be a weirdness caused by data returned by exiftool in this mode? Is numerical no longer useful when returning in JSON format IYO?

janfri commented 11 years ago

@rathgar Thanks for your analyzing. I will look at it later, not much time at the moment.

janfri commented 11 years ago

@rathgar numerical is also with json useful:

irb(main):001:0> MiniExiftool.new('test.jpg').focal_length
=> "75.0 mm"
irb(main):002:0> MiniExiftool.new('test.jpg', numerical: true).focal_length
=> 75
irb(main):003:0> MiniExiftool.new('test.jpg').orientation
=> "Horizontal (normal)"
irb(main):004:0> MiniExiftool.new('test.jpg', numerical: true).orientation
=> 1

The problem with the invalid chars in your example is ColorBalanceUnknown with binary data as value:

~/arbeit/mini_exiftool$ exiftool -ColorBalanceUnknown dba713d8-cf64-11e2-9b31-7586d7f9220b.jpg                                                                                            
Color Balance Unknown           : 0212[...]
~/arbeit/mini_exiftool$ exiftool -n -ColorBalanceUnknown dba713d8-cf64-11e2-9b31-7586d7f9220b.jpg                                                                                            
Color Balance Unknown           : 0212X'%?Tq??_ ... some strange chars ...

I have an approach to solve this, a little patience I want to write a test for it.

janfri commented 11 years ago

Problem fixed in mini_exiftool 2.2.0. Use :replace_invalid_chars:

MiniExiftool.new 'test.jpg', replace_invalid_chars: ''