salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.74k stars 401 forks source link

fine tune codet5_large on concode gives an error #84

Closed aldopareja closed 1 year ago

aldopareja commented 1 year ago

training on codet5_base works, how can I fix this?

CUDA_VISIBLE_DEVICES=0   python /home/aldo/CodeT5/run_gen.py       --do_train --do_eval --do_eval_bleu --do_test    --task concode --sub_task none --model_type codet5 --data_num 100    --num_train_epochs 1 --warmup_steps 10 --learning_rate 10e-5 --patience 3   --tokenizer_name=Salesforce/codet5-large  --model_name_or_path=Salesforce/codet5-large --data_dir /home/aldo/CodeT5/data    --cache_path saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/cache_data  --output_dir saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1  --summary_dir tensorboard   --save_last_checkpoints --always_save_model --res_dir saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/prediction --res_fn results/concode_codet5_base.txt   --train_batch_size 8 --eval_batch_size 8 --max_source_length 320 --max_target_length 150   2>&1 | tee saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/train.log
02/16/2023 21:59:27 - INFO - __main__ -   Namespace(adam_epsilon=1e-08, add_lang_ids=False, add_task_prefix=False, always_save_model=True, beam_size=10, cache_path='saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/cache_data', config_name='', data_dir='/home/aldo/CodeT5/data', data_num=100, dev_filename=None, do_eval=True, do_eval_bleu=True, do_lower_case=False, do_test=True, do_train=True, eval_batch_size=8, eval_steps=-1, eval_task='', gradient_accumulation_steps=1, lang='java', learning_rate=0.0001, load_model_path=None, local_rank=-1, log_steps=-1, max_grad_norm=1.0, max_source_length=320, max_steps=-1, max_target_length=150, model_name_or_path='Salesforce/codet5-large', model_type='codet5', no_cuda=False, num_train_epochs=1, output_dir='saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1', patience=3, res_dir='saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/prediction', res_fn='results/concode_codet5_base.txt', save_last_checkpoints=True, save_steps=-1, seed=1234, start_epoch=0, sub_task='none', summary_dir='tensorboard', task='concode', test_filename=None, tokenizer_name='Salesforce/codet5-large', train_batch_size=8, train_filename=None, train_steps=-1, warmup_steps=10, weight_decay=0.0)
02/16/2023 21:59:27 - WARNING - configs -   Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, cpu count: 96
02/16/2023 21:59:40 - INFO - models -   Finish loading model [738M] from Salesforce/codet5-large
02/16/2023 21:59:45 - INFO - utils -   Read 100 examples, avg src len: 68, avg trg len: 26, max src len: 249, max trg len: 82
02/16/2023 21:59:45 - INFO - utils -   [TOKENIZE] avg src len: 202, avg trg len: 33, max src len: 741, max trg len: 104
02/16/2023 21:59:45 - INFO - utils -   Create cache data into saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/cache_data/train_100.pt
100%|##########| 100/100 [00:02<00:00, 47.08it/s]
02/16/2023 21:59:47 - INFO - __main__ -   ***** Running training *****
02/16/2023 21:59:47 - INFO - __main__ -     Num examples = 100
02/16/2023 21:59:47 - INFO - __main__ -     Batch size = 8
02/16/2023 21:59:47 - INFO - __main__ -     Batch num = 13
[0] Train loss 11.713: 100%|##########| 13/13 [00:12<00:00,  1.05it/s]
02/16/2023 21:59:59 - INFO - utils -   Read 100 examples, avg src len: 69, avg trg len: 31, max src len: 193, max trg len: 100
02/16/2023 21:59:59 - INFO - utils -   Create cache data into saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/cache_data/dev_100.pt
100%|##########| 100/100 [00:02<00:00, 48.75it/s]
02/16/2023 22:00:01 - INFO - __main__ -     ***** Running ppl evaluation *****
02/16/2023 22:00:01 - INFO - __main__ -     Num examples = 100
02/16/2023 22:00:01 - INFO - __main__ -     Batch size = 8
Eval ppl: 100%|##########| 13/13 [00:03<00:00,  3.57it/s]
02/16/2023 22:00:05 - INFO - __main__ -     epoch = 0
02/16/2023 22:00:05 - INFO - __main__ -     eval_ppl = 26.63935
02/16/2023 22:00:05 - INFO - __main__ -     global_step = 13
02/16/2023 22:00:05 - INFO - __main__ -     ********************
02/16/2023 22:00:13 - INFO - __main__ -   Save the last model into saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/checkpoint-last/pytorch_model.bin
02/16/2023 22:00:13 - INFO - __main__ -     Best ppl:26.63935
02/16/2023 22:00:13 - INFO - __main__ -     ********************
02/16/2023 22:00:20 - INFO - __main__ -   Save the best ppl model into saved_large/concode/codet5_large_100_lr10_bs8_src320_trg150_pat3_e1/checkpoint-best-ppl/pytorch_model.bin
02/16/2023 22:00:20 - INFO - __main__ -   ***** CUDA.empty_cache() *****
02/16/2023 22:00:20 - INFO - utils -   Read 100 examples, avg src len: 69, avg trg len: 31, max src len: 193, max trg len: 100
02/16/2023 22:00:20 - INFO - utils -   Sample 5k data for computing bleu from /home/aldo/CodeT5/data/concode/dev.json
100%|##########| 100/100 [00:02<00:00, 48.45it/s]
02/16/2023 22:00:22 - INFO - __main__ -     ***** Running bleu evaluation on dev data*****
02/16/2023 22:00:22 - INFO - __main__ -     Num examples = 100
02/16/2023 22:00:22 - INFO - __main__ -     Batch size = 8
Eval bleu for dev set: 100%|##########| 13/13 [02:01<00:00,  9.36s/it]
Traceback (most recent call last):
  File "/home/aldo/CodeT5/run_gen.py", line 388, in <module>
    main()
  File "/home/aldo/CodeT5/run_gen.py", line 315, in main
    result = eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, 'dev', 'e%d' % cur_epoch)
  File "/home/aldo/CodeT5/run_gen.py", line 153, in eval_bleu_epoch
    codebleu = calc_code_bleu.get_codebleu(gold_fn, output_fn, args.lang)
  File "/home/aldo/CodeT5/evaluator/CodeBLEU/calc_code_bleu.py", line 21, in get_codebleu
    assert len(hypothesis) == len(pre_references[i])
AssertionError