mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
https://arxiv.org/abs/2004.11886
Other
596 stars 81 forks source link

Error while evaluating model #25

Closed kishorepv closed 3 years ago

kishorepv commented 3 years ago

Hi,

I tried evaluating using the provided checkpoint. I get the following error:

root@jetson:/nlp/lite-transformer/lite-transformer# configs/wmt14.en-fr/test.sh /data/nlp/embed200/ 0 valid Traceback (most recent call last): File "generate.py", line 192, in cli_main() File "generate.py", line 188, in cli_main main(args) File "generate.py", line 32, in main task = tasks.setup_task(args) File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/init.py", line 17, in setup_task return TASK_REGISTRY[args.task].setup_task(args, kwargs) File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/translation.py", line 166, in setup_task args.source_lang, args.target_lang = data_utils.infer_language_pair(paths[0]) File "/nlp/lite-transformer/lite-transformer/fairseq/data/data_utils.py", line 24, in infer_language_pair for filename in os.listdir(path): FileNotFoundError: [Errno 2] No such file or directory: 'data/binary/wmt14_en_fr'* Namespace(ignore_case=False, order=4, ref='/data/nlp/embed200//exp/valid_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/data/nlp/embed200//exp/valid_gen.out.sys') Traceback (most recent call last): File "score.py", line 88, in main() File "score.py", line 84, in main score(f) File "score.py", line 78, in score print(scorer.result_string(args.order)) File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 127, in result_string return fmt.format(order, self.score(order=order), bleup, File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 103, in score return self.brevity() math.exp(psum / order) 100 File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 117, in brevity r = self.stat.reflen / self.stat.predlen ZeroDivisionError: division by zero

kishorepv commented 3 years ago

Do we need the contents of data/binary/wmt14_en_fr directory for evaluation?

Michaelvll commented 3 years ago

Hi Kishore, Thank you for asking! We already included the preprocessed binary file of test dataset zipped in our provided checkpoint tar. You could test the checkpoint on the test dataset, by moving the test* and dict* files to data/binary/wmt14_en_fr (mkdir if you do not have it) and calling the test.sh. If you would like to test the checkpoint on the validation set, please run the configs/wmt14.en-fr/prepare.sh to get the preprocessed valid dataset.

Michaelvll commented 3 years ago

I am closing this issue. If you have any following up questions, please feel free to re-open it.

tomshalini commented 3 years ago

I am getting the same issue while testing the model. Even though required test and dict files are already in their required place. image

Could you(@Michaelvll ) please help me to test the trained checkpoint by resolving the error mentioned in original issue by @kishorepv ?

Michaelvll commented 3 years ago

Hi @tomshalini, could you please provide the command you used for testing?

tomshalini commented 3 years ago

Hi @tomshalini, could you please provide the command you used for testing?

Hello @Michaelvll , I am using below command for testing:

configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/checkpoint_best.pt' 0 test

Michaelvll commented 3 years ago

Hi @tomshalini, could you please provide the command you used for testing?

Hello @Michaelvll , I am using below command for testing:

configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/checkpoint_best.pt' 0 test

Could you try to use configs/wmt14.en-fr/test.sh '/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496/' 0 test? We will automatically add the checkpoint_best.pt in the test.sh.

tomshalini commented 3 years ago

Thank you @Michaelvll for your help. Now, I am getting the below error even though I am running on 2 GPUs.

Traceback (most recent call last):
File "generate.py", line 192, in cli_main() File "generate.py", line 188, in cli_main main(args) File "generate.py", line 106, in main hypos = task.inference_step(generator, models, sample, prefix_tokens) File "/home/shalinis/lite-transformer/fairseq/tasks/fairseq_task.py", line 246, in inference_step return generator.generate(models, sample, prefix_tokens=prefix_tokens) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, kwargs) File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 146, in generate encoder_outs = model.forward_encoder(encoder_input) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, kwargs) File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in forward_encoder return [model.encoder(encoder_input) for model in self.models] File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in return [model.encoder(*encoder_input) for model in self.models] File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 314, in forward x = layer(x, encoder_padding_mask) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranchv2.py", line 693, in forward x, = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/home/shalinis/lite-transformer/fairseq/modules/multibranch.py", line 37, in forward x = branch(q.contiguous(), incremental_state=incremental_state) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/home/shalinis/lite-transformer/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py", line 131, in forward output = self.linear2(output) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward return F.linear(input, self.weight, self.bias) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear output = input.matmul(weight.t()) RuntimeError: CUDA error: no kernel image is available for execution on the device Namespace(ignore_case=False, order=4, ref='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.sys') Traceback (most recent call last): File "score.py", line 88, in main() File "score.py", line 84, in main score(f) File "score.py", line 78, in score print(scorer.result_string(args.order)) File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 127, in result_string return fmt.format(order, self.score(order=order), bleup, File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 103, in score return self.brevity() math.exp(psum / order) 100 File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 117, in brevity r = self.stat.reflen / self.stat.predlen ZeroDivisionError: division by zero

tomshalini commented 3 years ago

Thank you @Michaelvll for your help. Now, I am getting the below error even though I am running on 2 GPUs.

Traceback (most recent call last): File "generate.py", line 192, in cli_main() File "generate.py", line 188, in cli_main main(args) File "generate.py", line 106, in main hypos = task.inference_step(generator, models, sample, prefix_tokens) File "/home/shalinis/lite-transformer/fairseq/tasks/fairseq_task.py", line 246, in inference_step return generator.generate(models, sample, prefix_tokens=prefix_tokens) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, kwargs) File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 146, in generate encoder_outs = model.forward_encoder(encoder_input) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(*args, kwargs) File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in forward_encoder return [model.encoder(encoder_input) for model in self.models] File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in return [model.encoder(*encoder_input) for model in self.models] File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 314, in forward x = layer(x, encoder_padding_mask) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranchv2.py", line 693, in forward x, = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/home/shalinis/lite-transformer/fairseq/modules/multibranch.py", line 37, in forward x = branch(q.contiguous(), incremental_state=incremental_state) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/home/shalinis/lite-transformer/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py", line 131, in forward output = self.linear2(output) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward return F.linear(input, self.weight, self.bias) File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear output = input.matmul(weight.t()) RuntimeError: CUDA error: no kernel image is available for execution on the device Namespace(ignore_case=False, order=4, ref='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.sys') Traceback (most recent call last): File "score.py", line 88, in main() File "score.py", line 84, in main score(f) File "score.py", line 78, in score print(scorer.result_string(args.order)) File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 127, in result_string return fmt.format(order, self.score(order=order), bleup, File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 103, in score return self.brevity() math.exp(psum / order) 100 File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 117, in brevity r = self.stat.reflen / self.stat.predlen ZeroDivisionError: division by zero

@Michaelvll could you please help me in resolving the above issue?