salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.74k stars 401 forks source link

Raising error when reproducing the multi-task training in README #77

Closed SichongHao closed 1 year ago

SichongHao commented 1 year ago

Hi, When I try to reproduce the multi-task training in the README file:

python run_exp.py --model_tag codet5_small --task multi_task --sub_task none

The experiment will fail with the message, when it reaches 64% of training:

12/09/2022 19:15:25 - INFO - __main__ -   ***** Eval results [summarize_ruby] *****
12/09/2022 19:15:25 - INFO - __main__ -     bleu = 15.21
12/09/2022 19:15:25 - INFO - __main__ -     em = 0.1586
12/09/2022 19:15:25 - INFO - __main__ -   [summarize_ruby best-ppl] bleu-4: 15.21, em: 0.1586, codebleu: 0.0000

Traceback (most recent call last):
  File "/home/ubuntu/CodeT5//run_multi_gen.py", line 535, in <module>
    main()
  File "/home/ubuntu/CodeT5//run_multi_gen.py", line 513, in main
    model.load_state_dict(torch.load(file))
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/torch/serialization.py", line 771, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/torch/serialization.py", line 270, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/ubuntu/miniconda3/lib/python3.9/site-packages/torch/serialization.py", line 251, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'saved_models/multi_task/codet5_small_all_lr5_s600000/checkpoint-last/summarize_ruby/pytorch_model.bin'
[382815] Train loss 4437.602:  64%|██████▍   | 382815/600000 [43:17:11<24:33:29,  2.46it/s]

What's wrong with it? And how to fix it?

yuewang-cuhk commented 1 year ago

Hi, from the error message, it is due to that the checkpoint to be loaded for testing does not exists. You just need to check whether you save the checkpoints properly by checking the 'saved_models/multi_task' folder.