Please add a clear and concise description of the bug, including observed and if possible expected behavior.
I have some zh-en corpus, the lines are 110,938,204. I want to perform a guided alignment transformer model.
--guided-alignment $TMXMALL/corpus.bpe.align --guided-alignment-weight 1
So the first step I should get a corpus.bpe.align file.
Then I trained an ammun model with 10,000,000 zh-en corpus, and then use this model to generate the corpus.bpe.align file for 110,938,204 zh-en corpus.
However, I meet an error about:
[2020-07-15 14:31:08] Error: Labels not matching logits shape (2560000000 != -1734967296, shape=1x800x64x50000 size=-1734967296)??
[2020-07-15 14:31:08] Error: Aborted from marian::Expr marian::Logits::applyLossFunction(const Words&, const std::function<IntrusivePtr<marian::Chainable<IntrusivePtr > >(IntrusivePtr<marian::Chainable<IntrusivePtr > >, IntrusivePtr<marian::Chainable<IntrusivePtr > >)>&) const in /media/tmxmall/a36811aa-0e87-4ba1-b14f-370134452449/wangxiuwan/marian-dev/src/layers/generic.cpp:31
How to reproduce
Describe steps or include command to reproduce the behavior.
Following is my command to generate the corpus align file.
gen-all-corpus-align-run-me.txt
Context
Marian version: Paste the output of --version here
CMake command: Type the cmake command you used and attach the output of --build-info all
Log file: Attach your training/decoding logs
Following is my log when generating the corpus align file.
generate-corpus-align-log.txt
Add any other information about the problem here.
I found this issue:
https://github.com/marian-nmt/marian-dev/issues/515
is similar to my error, so I wondering the reason is my corpus lines number is too big?
But if so, how should I use the guided-alignment training?
I really appreciated it if you could give me any advice. Thank you very much.
Bug description
Please add a clear and concise description of the bug, including observed and if possible expected behavior. I have some zh-en corpus, the lines are 110,938,204. I want to perform a guided alignment transformer model. --guided-alignment $TMXMALL/corpus.bpe.align --guided-alignment-weight 1
So the first step I should get a corpus.bpe.align file.
Then I trained an ammun model with 10,000,000 zh-en corpus, and then use this model to generate the corpus.bpe.align file for 110,938,204 zh-en corpus.
However, I meet an error about: [2020-07-15 14:31:08] Error: Labels not matching logits shape (2560000000 != -1734967296, shape=1x800x64x50000 size=-1734967296)?? [2020-07-15 14:31:08] Error: Aborted from marian::Expr marian::Logits::applyLossFunction(const Words&, const std::function<IntrusivePtr<marian::Chainable<IntrusivePtr > >(IntrusivePtr<marian::Chainable<IntrusivePtr > >, IntrusivePtr<marian::Chainable<IntrusivePtr > >)>&) const in /media/tmxmall/a36811aa-0e87-4ba1-b14f-370134452449/wangxiuwan/marian-dev/src/layers/generic.cpp:31
How to reproduce
Describe steps or include command to reproduce the behavior. Following is my command to generate the corpus align file. gen-all-corpus-align-run-me.txt
Context
--version
here--build-info all
Add any other information about the problem here. I found this issue: https://github.com/marian-nmt/marian-dev/issues/515 is similar to my error, so I wondering the reason is my corpus lines number is too big?
But if so, how should I use the guided-alignment training?
I really appreciated it if you could give me any advice. Thank you very much.