joshua-decoder / joshua

Joshua Statistical Machine Translation Toolkit
http://joshua-decoder.org/
121 stars 56 forks source link

Vocabulary optimization for multithreading speedup #213

Closed fhieber closed 9 years ago

fhieber commented 9 years ago

This change fixes a rare multithreading race condition issue when an empty vocabulary (in the case of no grammars loaded and lots of input data is passed to the decoder) was filled by multiple threads. In addition, by storing the list of nonTerminalIndices in the vocabulary the iteration over all known words for every input sentence is avoided. This provides a significant speedup in multithreaded decoding. MurmurHash functionality was removed for code simplicity.

mjpost commented 9 years ago

Awesome, I am testing this now.

mjpost commented 9 years ago

I tested on a Europarl phrase-based grammar, with very nice results (6-CPU machine, with 12 using hyperthreading): vocab_fix

I'll now just run it past the regression tests and the merge it in.