marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.22k stars 228 forks source link

Generat corpus align file for guided alignment training error #339

Closed wangxw1023 closed 4 years ago

wangxw1023 commented 4 years ago

Bug description

Please add a clear and concise description of the bug, including observed and if possible expected behavior. I have some zh-en corpus, the lines are 110,938,204. I want to perform a guided alignment transformer model. --guided-alignment $TMXMALL/corpus.bpe.align --guided-alignment-weight 1

So the first step I should get a corpus.bpe.align file.

Then I trained an ammun model with 10,000,000 zh-en corpus, and then use this model to generate the corpus.bpe.align file for 110,938,204 zh-en corpus.

However, I meet an error about: [2020-07-15 14:31:08] Error: Labels not matching logits shape (2560000000 != -1734967296, shape=1x800x64x50000 size=-1734967296)?? [2020-07-15 14:31:08] Error: Aborted from marian::Expr marian::Logits::applyLossFunction(const Words&, const std::function<IntrusivePtr<marian::Chainable<IntrusivePtr > >(IntrusivePtr<marian::Chainable<IntrusivePtr > >, IntrusivePtr<marian::Chainable<IntrusivePtr > >)>&) const in /media/tmxmall/a36811aa-0e87-4ba1-b14f-370134452449/wangxiuwan/marian-dev/src/layers/generic.cpp:31

How to reproduce

Describe steps or include command to reproduce the behavior. Following is my command to generate the corpus align file. gen-all-corpus-align-run-me.txt

Context

Add any other information about the problem here. I found this issue: https://github.com/marian-nmt/marian-dev/issues/515 is similar to my error, so I wondering the reason is my corpus lines number is too big?

But if so, how should I use the guided-alignment training?

I really appreciated it if you could give me any advice. Thank you very much.

wangxw1023 commented 4 years ago

I submit this issue to marian-dev.