remvee / exifr

EXIF Reader
https://remvee.github.io/exifr/
MIT License
489 stars 65 forks source link

Wrong encoding for artist and image_description #72

Closed vogelfr closed 9 months ago

vogelfr commented 9 months ago

Hi,

When the EXIF data contains non-ASCII characters (e.g. 'é' or 'ø') they get improperly outputed: image_description = "Troms\xC3\xB8" instead of image_description = "Tromsø" Is there any way to change it to output non-ASCII characters correctly?

Many thanks :)

remvee commented 9 months ago

The TIFF specs (EXIF data is actually a TIFF blob) describes the ASCII field type as:

8-bit byte that contains a 7-bit ASCII code; the last byte must be NUL (binary zero)

So it actually seems "illegal" to use the 8th bit.. but exifr is very lenient (lazy) and simply allows full 8 bits so you can use #force_encoding('UTF-8') to fix up the string values.

dquinton commented 2 weeks ago

Hi, I also ran into this problem, while writing from exif data into a Jekyll liquid tag, with exif comment containing characters such as [ɛ ơ ʉ ə ɲ], throwing the error:

Liquid Exception: incompatible character encodings: ASCII-8BIT and UTF-8

Please could you elaborate a little for me, where I could add or uncheck this option #force_encoding('UTF-8') in exifr? Thanks

vogelfr commented 2 weeks ago

I assume you have found the exiftag tool that uses exifr in the background. In the jekyll-exiftag.rb I added the following:

      begin
        exif = EXIFR::JPEG::new(file_name)
        ret = tag.split('.').inject(exif){|o,m| o.send(m)}
        if ret.is_a? String # <----- FROM HERE
          ret.force_encoding('UTF-8')
        end # <----- TO HERE
      return ret

This will ensure that for any kind of String returned by exifr the result will be read as UTF-8. I did still have some weird issue with a specific character sequence (\\xC3\\xB8) but otherwise it works nicely.

dquinton commented 2 weeks ago

Yes, I installed by gem install... the file looks like

 begin
        exif = EXIFR::JPEG::new(file_name)
        return tag.split('.').inject(exif){|o,m| o.send(m)}
      rescue
        ""
      end
    end

/var/lib/gems/3.1.0/gems/jekyll-exiftag-0.1.0/lib/jekyll-exiftag.rb

making the changes throws errors, perhaps you can attach yours?

vogelfr commented 2 weeks ago

sorry, entire blocks looks like this:

# try it and return empty string on failure
      begin
        exif = EXIFR::JPEG::new(file_name)
        ret = tag.split('.').inject(exif){|o,m| o.send(m)}
        if ret.is_a? String
          ret.force_encoding('UTF-8')
        end
        return ret
      rescue StandardError => e  
        puts e.message
      end

File is here.

dquinton commented 2 weeks ago

Got it! No errors thrown with those characters. Good one!