The tokenizer throws an UnsupportedOperationException with the following input:
해쵸쵸쵸쵸쵸쵸쵸쵸춏
It also seems to throw the exception with more than 8 of the '쵸' character in the middle, but doesn't fail with less than 8. Here's a more complete stack trace:
java.lang.UnsupportedOperationException: empty.minBy
at scala.collection.TraversableOnce$class.minBy(TraversableOnce.scala:252)
at scala.collection.AbstractTraversable.minBy(Traversable.scala:104)
at com.twitter.penguin.korean.tokenizer.KoreanTokenizer$.com$twitter$penguin$korean$tokenizer$KoreanTokenizer$$parseKoreanChunk(KoreanTokenizer.scala:197)
at com.twitter.penguin.korean.tokenizer.KoreanTokenizer$$anonfun$tokenize$1.apply(KoreanTokenizer.scala:99)
at com.twitter.penguin.korean.tokenizer.KoreanTokenizer$$anonfun$tokenize$1.apply(KoreanTokenizer.scala:96)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:252)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:252)
at scala.collection.immutable.List.flatMap(List.scala:344)
at com.twitter.penguin.korean.tokenizer.KoreanTokenizer$.tokenize(KoreanTokenizer.scala:96)
at com.twitter.penguin.korean.TwitterKoreanProcessor$.tokenize(TwitterKoreanProcessor.scala:49)
at com.twitter.penguin.korean.TwitterKoreanProcessor.tokenize(TwitterKoreanProcessor.scala)
at com.twitter.penguin.korean.TwitterKoreanProcessorJava.tokenize(TwitterKoreanProcessorJava.java:56)
The tokenizer throws an UnsupportedOperationException with the following input:
It also seems to throw the exception with more than 8 of the '쵸' character in the middle, but doesn't fail with less than 8. Here's a more complete stack trace:
Thanks!