kmpoon / hlta

Provides functions for hierarchical latent tree analysis on text data for hierarchical topic detection
GNU General Public License v3.0
81 stars 23 forks source link

fix possible encoding problem (force utf8) #2

Closed sxjscience closed 6 years ago

sxjscience commented 7 years ago

The reading function will report the following error in Windows.

Exception in thread "main" java.nio.charset.UnmappableCharacterException: Input length = 1
        at java.nio.charset.CoderResult.throwException(Unknown Source)
        at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
        at sun.nio.cs.StreamDecoder.read(Unknown Source)
        at java.io.InputStreamReader.read(Unknown Source)
        at java.io.BufferedReader.fill(Unknown Source)
        at java.io.BufferedReader.readLine(Unknown Source)
        at java.io.BufferedReader.readLine(Unknown Source)
        at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:72)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
        at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294)
        at scala.collection.AbstractIterator.toList(Iterator.scala:1336)```

This PR should solve this problem by explicitly reading as UTF8 encoding.

sxjscience commented 7 years ago

Met by @ShengruiLYU when doing the UROP project

sxjscience commented 7 years ago

@clairepchen @chzhour

jef33 commented 6 years ago

This is fixed in 2.0