jhclark / bigfatlm

Hadoop MapReduce training of modified Kneser-Ney smoothed language models
GNU Lesser General Public License v3.0
30 stars 10 forks source link

Can't Get Past Generating VocabIDs #1

Open chanelm opened 13 years ago

chanelm commented 13 years ago

was testing out this project and whenever i get to generating the vocabIDs (right after the first hadoop m/r), it always throws this error:

BigFatLM.Sentences 100000 BigFatLM.Tokens 1003830 BigFatLM.Types 63106 Finished: BigFatLM -- Make Vocabulary ID's Merging unigram count files: BigFatLM -- Make Vocabulary ID's Copying HDFS file hdfs://rhl095.in.escapemg.com:54310/user/search/tmp/BigFatLM5204421244738246067 to /tmp/BigFatLM956014149429852912.vocab1 Oct 11, 2011 9:22:50 PM bigfat.hadoop.SortUtils sortInPlace INFO: Running external sort: sort -n -r -k2 -t -o /tmp/BigFatLM956014149429852912.vocab1 /tmp/BigFatLM956014149429852912.vocab1 Assigning vocab IDs for: BigFatLM -- Make Vocabulary ID's Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at bigfat.step1.VocabIteration.run(VocabIteration.java:79) at bigfat.BigFatLM.run(BigFatLM.java:115) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at bigfat.BigFatLM.main(BigFatLM.java:140) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

jhclark commented 13 years ago

Could you say a bit more about this error. A few questions: