joshua-decoder / thrax

Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation
joshua-decoder.org
Other
15 stars 6 forks source link

Bug: bad word ID #10

Open mjpost opened 8 years ago

mjpost commented 8 years ago

I just got this error:

Error: java.lang.RuntimeException: Word id 2147483647 out of range 0 286057 at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Partition.getPartition(WordLexicalProbabilityCalculator.java:133) at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Partition.getPartition(WordLexicalProbabilityCalculator.java:121) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:82) at edu.jhu.thrax.hadoop.features.WordLexicalProbabilityCalculator$Map.map(WordLexicalProbabilityCalculator.java:28) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

lewismc commented 8 years ago

Hi @mjpost I am seeing the exact same using Thrax in Joshua master branch. You can see my full thread over at the dev@joshua mailing list from today.