xlang-ai / UnifiedSKG

[EMNLP 2022] Unifying and multi-tasking structured knowledge grounding with language models
https://arxiv.org/abs/2201.05966
Apache License 2.0
546 stars 58 forks source link

Can not reproduce the results of spider with T5-base #41

Closed xufangzhi closed 1 year ago

xufangzhi commented 1 year ago

Hi, thanks for sharing the great work. But I have some problems when reproducing the results of T5-base on Spider dataset with prefix tuning. I use the following command with 8 gpus: python -m torch.distributed.launch --nproc_per_node 8 --master_port 1234 \ train.py \ --seed 2 \ --cfg Salesforce/T5_base_prefix_spider_with_cell_value.cfg \ --run_name T5_base_prefix_spider \ --logging_strategy steps \ --logging_first_step true \ --logging_steps 4 \ --evaluation_strategy steps \ --eval_steps 500 \ --metric_for_best_model avr \ --greater_is_better true \ --save_strategy steps \ --save_steps 500 \ --save_total_limit 1 \ --gradient_accumulation_steps 8 \ --num_train_epochs 400 \ --save_total_limit 1 \ --adafactor true \ --learning_rate 5e-5 \ --load_best_model_at_end \ --do_train \ --do_eval \ --do_predict \ --predict_with_generate \ --output_dir output/T5_base_prefix_spider \ --overwrite_output_dir \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 4 \ --generation_num_beams 1 \ --generation_max_length 128 \ --input_max_length 1024 \ --ddp_find_unused_parameters true

We get the results of 52.32 for T5-base on Spider, and 59.57 for T5-large. They are different from the results reported in the paper. Can you help me with this problem? Thank you.

Timothyxxx commented 1 year ago

Hi, thanks for your question and sorry for the delay!

It will take longer steps when you're using prefix-tuning (or maybe other parameter-efficient methods) to get comparable results with fine-tuned ones, as shown in Table 18.

image

The performance slope will be more sluggish. So could I kindly ask how many steps you trained with it?

xufangzhi commented 1 year ago

Hi, thanks for the reply. We also tried to train more steps (about 100,000 steps) according to Table 18. The exact match of T5-base prefix tuning on spider, reaches 55.13. However, it is still different from the reported results.

ChenWu98 commented 1 year ago

Hi, thanks for the update! Based on the command you provided, the effective batch size is 64, while we used 128 in most cases. Can you try a larger effective batch size or train for an even longer time?