AssertionError during training model#4

jcyk / copyisallyouneed

Code for our ACL2021 paper Neural Machine Translation with Monolingual Translation Memory

82 stars 12 forks source link

AssertionError during training model#4 #7

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello. When I try to train model#4 by sh scripts/esen/train.multihead.dynamic.sh, there is an error occuring in file retriever.py as the image below. . Could you help me to check this out? Thanks a lot.

ghost commented 3 years ago

Difference in buliding index.

I am not sure about what 'train.tgt.txt' in 'bulid_index.sh' refers to. I simply split train.txt by '\t' to generate this file.
Because the card used in our lab is 2080ti, so my batchSize in build_index.sh can only reach 2048, which is 1/4 of your instruction. I'm not sure if this is the cause of the problem.

I log the sentence which satisfy the condition self.mem_pool[pred] == inp['tgt_raw_sents'][bid] ，find

There was no need for external expertise . & # 124

occuring in train.tgt.txt several times, so maybe it will cause the len(tmp_list)<topk

Still sincerely waiting for some guidance.

ghost commented 3 years ago

In order to solve this problem for a short time, there seems to be a not-so-cool way to set all allow-hit to True such as src_repr, src_mask, mem_repr, mem_mask, copy_seq, mem_bias = self.encode_step(data,work=True, update_mem_bias=update_mem_bias) .

jcyk commented 3 years ago

Hi @20184365

The problem is caused by that there may be duplicates in your train.tgt.txt The solution is sort -u train.tgt.txt -o train.tgt.txt

Let me know if the problem persists.

ghost commented 3 years ago

This method works. Thank you ❤️ If someone meets the same problem as mine, be sure to rebuild the index after sort -u train.tgt.txt -o train.tgt.txt.