Closed eyuansu62 closed 2 years ago
Yes, if you use our provided commands. To disable DDP, you may try something like
python train.py --seed 2 --cfg Salesforce/T5_base_finetune_wikitq.cfg --run_name T5_base_finetune_wikitq --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 8 --num_train_epochs 400 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/T5_base_finetune_wikitq --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 16 --generation_num_beams 4 --generation_max_length 128 --input_max_length 1024
But our code has not been tested on this usage.
BTW, have you ever tried UnifiedSKG + PICARD?
Nope. It's an interesting direction to design a unified-grammar and design PICARD on it.
Sorry, one more little question, orz :) If I do not use deepspeed, I train t5-3b on spider with bs=2 and DDP, it will cause OOM error. Is it normal?
I remember that t5-3b requires deepspeed even on the 32G V100.
How about 40G A100?
Surely ok.
Feel free to contact us if you have further questions!
Does the train.py use the DDP training default, even if the n_gpus = 1?