Closed plaa closed 8 years ago
Hi @plaa,
your feedback is highly welcome! I also have problems to really understand unicode in every detail.
The first point you mentioned was quite easy to fix, but for the later I needed to reimplement the Unicode module. Especially the multi-chunk escaped chars are quite complicated. Thankfully I found a very similar code in the JSON gem.
Thanks to your great description I might have fixed the problems. You can tryout the new version using Bundler and the following line:
gem 'java-properties', :git => 'https://github.com/jnbt/java-properties.git', :branch => ' fix-unicode_outside_bmp'
Or could you provide me an example file?
Hi,
I'm having a bit of trouble testing the gem due to the setup on my laptop and unfamiliarity with Bundler. But the following code:
puts JavaProperties::Encoding.decode!("a\\u00e4eb")
puts JavaProperties::Encoding.encode!("𪀯")
should output:
aäeb
\ud868\udc2f
It should be straightforward to make a spec from those.
I'm actually only using the encode / decode methods in my project, as I need to read also the comments from the property file.
This works now:
2.3.1 :001 > require 'java-properties'
=> true
2.3.1 :002 > puts JavaProperties::Encoding.decode!('a\u00e4eb')
aäeb
=> nil
2.3.1 :003 > puts JavaProperties::Encoding.encode!('𪀯')
\ud868\udc2f
=> nil
2.3.1 :004 > puts JavaProperties::Encoding.encode!('aäeb')
a\u00e4eb
=> nil
2.3.1 :005 > puts JavaProperties::Encoding.decode!('\ud868\udc2f')
𪀯
=> nil
2.3.1 :006 >
Would it help if I release a beta version of the gem on rubygems.org?
Hi,
I don't think I can test it much better than that. :)
On Wed, Sep 21, 2016 at 11:24 AM, jnbt notifications@github.com wrote:
This works now:
2.3.1 :001 > require 'java-properties' => true 2.3.1 :002 > puts JavaProperties::Encoding.decode!('a\u00e4eb') aäeb => nil 2.3.1 :003 > puts JavaProperties::Encoding.encode!('𪀯') \ud868\udc2f => nil 2.3.1 :004 > puts JavaProperties::Encoding.encode!('aäeb') a\u00e4eb => nil 2.3.1 :005 > puts JavaProperties::Encoding.decode!('\ud868\udc2f') 𪀯 => nil 2.3.1 :006 >
Would it help if I release a beta version of the gem on rubygems.org?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jnbt/java-properties/issues/7#issuecomment-248544623, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXqeWEexLdGQeyzoPNrJWZ4KsRH6tZuks5qsOnKgaJpZM4KBhJH .
Sampo Niskanen <=> http://www.iki.fi/sampo.niskanen/
I release a new version 0.2.0
of this gem to address this issue.
The properties file format supports only the four-digit \uxxxx notation. The library fails in decoding Unicode escapes that are followed by 0-9 a-f and encoding Unicode characters outside of the BMP. Characters outside of the BMP need to be encoded as two Unicode escapes using UTF-16 encoding.
Examples:
a\u00e4b
should be decoded toaäb
while the gem decodes it toa๎b
𪀯
should be encoded to\ud868\udc2f
while the gem encodes it to\u02a02f
References:
The most official spec of the format is at http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader) It specifies only "escape sequences similar to those used for [Java] character and string literals", which in turn supports only four-digit notation.
These can also be verified by encoding / decoding the following file using the
native2ascii
command provided with the JDK: