AssertionError while evaluating 'translation'

saichandrapandraju commented 3 years ago

Hi @wasiahmad ,

I am trying 'translation' capabilities of PLBART and started finetuning as mentioned. But I'm getting below error in evaluation -

File "calc_code_bleu.py", line 34, in <module>
    assert len(hypothesis) == len(pre_references[i])
AssertionError

Here is a bit detailed traceback -

2021-07-15 13:57:58 | INFO | fairseq_cli.train | early stop since valid performance hasn't improved for last 10 runs
2021-07-15 13:57:58 | INFO | fairseq_cli.train | begin save checkpoint
2021-07-15 13:58:19 | INFO | fairseq.checkpoint_utils | saved checkpoint /content/PLBART/scripts/code_to_code/translation/java_cs/checkpoint_last.pt (epoch 22 @ 14168 updates, score 80.08) (writing took 20.417050701000335 seconds)
2021-07-15 13:58:19 | INFO | fairseq_cli.train | end of epoch 22 (average epoch stats below)
2021-07-15 13:58:19 | INFO | train | {"epoch": 22, "train_loss": "2.08", "train_nll_loss": "0.177", "train_ppl": "1.13", "train_wps": "1562.9", "train_ups": "1.76", "train_wpb": "890.1", "train_bsz": "16", "train_num_updates": "14168", "train_lr": "4.93409e-05", "train_gnorm": "0.534", "train_train_wall": "255", "train_wall": "8414"}
2021-07-15 13:58:19 | INFO | fairseq_cli.train | done training in 8412.8 seconds
  0% 0/250 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/fairseq_cli/generate.py:172: UserWarning: --sacrebleu is deprecated. Please use --scoring sacrebleu instead.
  scorer = scoring.build_scorer(args, tgt_dict)
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-generate", line 8, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.7/dist-packages/fairseq_cli/generate.py", line 379, in cli_main
    main(args)
  File "/usr/local/lib/python3.7/dist-packages/fairseq_cli/generate.py", line 41, in main
    return _main(args, sys.stdout)
  File "/usr/local/lib/python3.7/dist-packages/fairseq_cli/generate.py", line 172, in _main
    scorer = scoring.build_scorer(args, tgt_dict)
  File "/usr/local/lib/python3.7/dist-packages/fairseq/scoring/__init__.py", line 54, in build_scorer
    return _build_scorer(args)
  File "/usr/local/lib/python3.7/dist-packages/fairseq/registry.py", line 54, in build_x
    return builder(args, *extra_args, **extra_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/fairseq/scoring/bleu.py", line 40, in __init__
    character_tokenization=self.args.sacrebleu_char_level,
AttributeError: 'Namespace' object has no attribute 'sacrebleu_char_level'
Traceback (most recent call last):
  File "/content/PLBART/evaluation/evaluator.py", line 36, in <module>
    main()
  File "/content/PLBART/evaluation/evaluator.py", line 20, in main
    assert len(refs) == len(pres)
AssertionError
Traceback (most recent call last):
  File "calc_code_bleu.py", line 34, in <module>
    assert len(hypothesis) == len(pre_references[i])
AssertionError

Could you plz suggest how to proceed further..

wasiahmad commented 3 years ago

The error is due to

AttributeError: 'Namespace' object has no attribute 'sacrebleu_char_level'

I am using the following version of sacrebleu.

Name: sacrebleu
Version: 1.2.11
Summary: Hassle-free computation of shareable, comparable, and reproducible BLEU scores
Home-page: https://github.com/awslabs/sockeye

saichandrapandraju commented 3 years ago

Hi @wasiahmad ,

I found below workaround -

For translation, mbart_base is not present in fairseq==0.9.0 so I installed fairseq==0.10.0.
For fairseq==0.10.0, sacrebleu version should be >=1.4.x . So I installed sacrebleu==1.5.1.
Replace --sacrebleu with --scoring sacrebleu in 112 line of scripts/code_to_code/translation/run.sh.
Commented 28 line in source/sentence_prediction.py as it's erroring ModuleNotFoundError.
Treesitter parser gave some error. so ran below:
```
%cd evaluation/CodeBLEU/parser
!bash build.sh
%cd ../../..
```
With these changes everything ran successfully.

I have couple of queries though -

What can be the maximum input and output lengths for PLBART?
Seems the translation data focused on 'functions'. Can this be used for all kinds of programs which involve classes, test-cases, etc..?

wasiahmad commented 3 years ago

This is really helpful.

I forgot documenting about running build.sh.
May I know what did you use instead of using maybe_shorten_dataset in line 28 in source/sentence_prediction.py?

Answers to questions

The maximum length should be 512.
If you have some data to fine-tune, it would work. We did such experiments and it worked.

saichandrapandraju commented 3 years ago

Thanks @wasiahmad ,

As I'm working only on translation part , I thought there's no need of source/sentence_prediction.py. So got away with it (atleast for now) :-)

wasiahmad commented 3 years ago

We are working on a release where we will release all the fine-tuned checkpoints. Also, there are bugs that are also fixed. We will also provide scripts to set up conda environment so that experiments can be run without any issue.

saichandrapandraju commented 3 years ago

That would be great !!!

wasiahmad / PLBART

AssertionError while evaluating 'translation' #12