Closed lukeorland closed 9 years ago
Once/if this is proven to work, I'll strip out all the commented lines that were the plans for this change, and push up another commit.
I'm probably not using the most idomatic perl; feel free to make suggestions.
Okay, I'm testing this now.
do you guys use mgiza or fast align? mgiza is multi-threaded, and fast align is err fast. I heard it's gonna be even faster soon with multithreading support
@hieuhoang, we have options for GIZA++ or the Berkeley aligner. We don't use mgiza, but instead split the corpus into blocks and align them independently. This isn't as good as mgiza (which I think just parallelizes the E step?), and I haven't but should test the comparison, but on the other hand it works for any aligner we might want to use.
I've found better performance with Berkeley, particularly for noisy text and low-resource languages. I've been meaning to import fast_align but haven't got around to it. It would be good to compare it to Berkeley.
Tested this and it works great, thanks, @lukeorland.
(quoting @mjpost )
https://trello.com/c/Cv4UQjLM/64-parallelize-alignment