Closed ahorek closed 5 years ago
reduced case:
"\u{1F48C}" =~ /\=\?/i
This is related to https://github.com/jruby/joni/issues/17, Onigmo appears to compare first two bytes of "\u{1F48C}" to "=?" in exact info regexp field (used by fast skip algorithms). It uses for that mbclen(enc, p, end) function aka onigenc_mbclen_approximate which will never return negative values and acts as a safeguard for broken characters.
The issue was introduced with https://github.com/jruby/joni/commit/012bb20e520eb607ab2c7d6e271cdb140e353b88 which turned on Search.BM_IC fast skip boyer-moore / sunday case insensitive search routine. The problem doesnt seem to be in the routine itself, but how case insensitive comparison is being handled. Until we find the solution we can fallback to Search.SLOW_IC for now.
Temporary fix is in https://github.com/jruby/joni/commit/118dbdeecb42d736ed3dbbcccce13f2fb98753b7 which will not degrade performance from previous versions. Keeping the issue open until we decide on adding unsave and approximate length routines to org.jcodings.Encoding.
@ahorek joni is released and jruby snaps updated, thanks for the report.
@lopex Is there a further fix needed here?
The ultimate fix would be to implement approximate length for our encodings. For now, as a workaround, Sunday search is turned off for case insensitive forward searches.
Closing, created a new issue that explains it here https://github.com/jruby/jcodings/issues/26
hi @lopex the recent mail build started to fail https://travis-ci.org/mikel/mail/jobs/435704866 https://github.com/mikel/mail
not sure if the problem is in joni or jcodings. If you have time, please take a look, thanks.