jruby / jcodings

Java-based codings helper classes for Joni and JRuby
MIT License
21 stars 29 forks source link

Mongolian vowel separator needs to be removed from CR_Bank range #13

Closed chrisseaton closed 6 years ago

chrisseaton commented 8 years ago

The JCodings CR_Blank.bin still has the 0x180e-0x180e range in it, which needs to be removed to match Ruby 2.2 (https://github.com/ruby/ruby/blob/004bb16ccc01608849c8d200aed990b6d6e2b98e/enc/unicode/name2ctype.h#L668-L678).

RubySpec is here https://github.com/ruby/rubyspec/commit/6076e90a540157a609d7f356290cbd7ce0c740e4.

JRuby currently fails this spec.

I looked at the CR_Blank.bin file and I think I just need to remove the range bytes and decrement the count from 9 to 8, but I'm not sure what the leading 0000 0013 in the file means. What should I do with that?

I'm also unsure how to test this in JRuby. JCodings is a dependency of a dependency isn't it? How can I get JRuby to use a local snapshot of JCodings with this change?

headius commented 8 years ago

We generate those files directly from MRI source using the included Ruby scripts. If you want to test JRuby with them just update poms to use the snapshot, mvn install from jcodings, and go for it.

chrisseaton commented 8 years ago

The problem is that when MRI updated to Unicode 7.0 they updated the generated header file name2ctype.h, but not the source file name2ctype.src, so they no longer match and the generated .h is the correct one. We read from the out of date .src.

https://github.com/ruby/ruby/blob/004bb16ccc01608849c8d200aed990b6d6e2b98e/enc/unicode/name2ctype.h#L668-L678

https://github.com/ruby/ruby/blob/004bb16ccc01608849c8d200aed990b6d6e2b98e/enc/unicode/name2ctype.src#L555-L567

https://github.com/ruby/ruby/commit/64c81e40d4706d3685d540febb06e28609641aa4

It looks like the source for these doesn't use the .src file any more and have deleted it.

https://github.com/k-takata/Onigmo/tree/master/enc/unicode

Maybe MRI needs to do that as well, and JCodings need to parse the .h instead of the .src.

duerst commented 8 years ago

Related discussion on the CRuby side: https://bugs.ruby-lang.org/issues/11706.

lopex commented 6 years ago

@chrisseaton: jcodings now parse name2ctype.h