joshua-decoder / thrax

Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation
joshua-decoder.org
Other
15 stars 6 forks source link

Performance Improvements #4

Closed jganitkevitch closed 11 years ago

jganitkevitch commented 11 years ago

Changed vocabulary collection to be distributed, eliminating single-reducer bottleneck.

Changed feature keys to be stored as integer ids via the vocabulary now – should save additional space on large grammars and when extracting many features.