Closed edouard closed 2 years ago
Answering my own questions here...
Interestingly enough, taking that string above and replacing WINDY with TWITTER doesn’t work 🤔:
string = 'TWITTERのアカウントを作成する'
iterator = TwitterCldr::Segmentation::BreakIterator.new(:ja)
iterator.each_word(string) {|word| puts word }
#=> /Users/edouard/.rvm/gems/ruby-3.1.2@webtranslateit.com/gems/twitter_cldr-6.11.3/lib/twitter_cldr/segmentation/cj_break_engine.rb:110:in `<': comparison of Integer with nil failed (ArgumentError)
It seems to be due to the length of the latin word:
string = 'TWITTのアカウントを作成する'
iterator = TwitterCldr::Segmentation::BreakIterator.new(:ja)
iterator.each_word(string) {|word| puts word }
#=> TWITT
の
アカウントを作成する
Looks like the error we see has to do with the length of the latin word.
Hey @edouard, thanks for reporting this. Please see #261 for fix details. The fix has been published in v6.11.4.
Cool! Thanks for fixing it so quickly! 👍🏽
Describe the bug
We’re using
TwitterCldr::Segmentation::BreakIterator
’seach_word
method to count words in multiple languages. We just got an exception for a string in Japanese, which contains both Japanese and Latin characters. This is common for when using Western brand names for instance.To Reproduce
Steps to reproduce the behavior:
Also, this string works:
Interestingly enough, taking that string above and replacing
WINDY
withTWITTER
doesn’t work 🤔:Expected behavior
The
BreakIterator
shouldn't raise an exceptionScreenshots If applicable, add screenshots to help explain your problem.
Environment ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-darwin21]
Additional context Add any other context about the problem here.