encoding issue - Githubissues

toretore / barby

The Ruby barcode generator

http://toretore.github.com/barby/

MIT License

846 stars 171 forks source link

encoding issue #61

Closed rubydesign closed 8 years ago

rubydesign commented 8 years ago

I have my db and ruby set to utf-8 (ascii doesn't diplay our language)

Now i'm getting an exception: Encoding::CompatibilityError (incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)):

here: barby-0.6.2/lib/barby/barcode/code_128.rb:221:in split' barby-0.6.2/lib/barby/barcode/code_128.rb:221:indata='

So the data seems my string (utf-8) and the regex is from that file, which is coded ascii.

I'm still investigating as to how this can be avoided, just thought i'll let you know.

toretore commented 8 years ago

Code128 data must always be ASCII-8BIT/BINARY.

Code128.new(utf8_string.force_encoding('ASCII-8BIT'))

rubydesign commented 8 years ago

yes, i guess i figured that much. Still, the exception is in this code, so just closing the issue seems a little easy. Maybe one could use force_encoding. (or iconv)

toretore commented 8 years ago

It is the responsibility of the caller to provide data with the correct encoding, if the string is actually binary it should be marked as such. We can't go and assume that something marked as utf8 is not. But it could probably be better documented. On Apr 13, 2016 8:06 PM, "Torsten Rüger" notifications@github.com wrote:

yes, i guess i figured that much. Still, the exception is in ths code, so just closing the issue seems a little easy. Maybe one could use force_encoding.

— You are receiving this because you modified the open/close state.

Reply to this email directly or view it on GitHub https://github.com/toretore/barby/issues/61#issuecomment-209571933

rubydesign commented 8 years ago

The exception can be avoided with

data.encode(Encoding::ASCII_8BIT, :invalid => :replace, :undef => :replace, :replace => "")

in my case without loss of data. Reading http://stackoverflow.com/questions/3159742/how-to-convert-character-encoding-with-ruby-1-9 it seems that ascii8 and utf8 encode some data (the - , unicode \xE2\x80\x93) differently and errors could come from just the encoding (vs non ascii8 encodable data).

This would not be a good thing to push to a user, imho.