Closed hsb1995 closed 3 months ago
I ignored the dataset thing for the above question and proceeded directly to run the following steps. But I only have two 3090s, changed all the parameters and still the memory is overflowing. Is there nothing you can do on your end?
deepspeed --num_gpus 2 --master_port=9901 train_bash.py \ --deepspeed "$SCRIPT_DIR/ds_config.json" \ --stage kd \ --kd_alpha 1.0 \ --kd_beta 1 \ --kd_loss_scale 0.01 \ --cutoff_len 1024 \ --model_name_or_path $STUDENT_MODEL_START_PATH \ --teacher_model_name_or_path $TEACHER_MODEL_PATH \ --do_train \ --dataset $DATASET \ --dataset_dir $SCRIPT_DIR/../data \ --template vanilla \ --finetuning_type full \ --output_dir $STUDENT_MODEL_PATH \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 16 \ --lr_scheduler_type cosine --warmup_steps 500 \ --adam_beta1 0.9 --adam_beta2 0.98 --weight_decay 0.01 \ --logging_steps 1 \ --save_steps 5000 \ --learning_rate 1e-4 \ --num_train_epochs 50.0 \ --plot_loss \ --fp16
Hi! Thanks for your attention. I regret to face the GPU resource bottleneck from your environment. :-( But for data, they (data from my online-disk) have been generated from my work and released in my online-disk, and you can directly use them to reproduce our experiments. You need not to generate them again. I hope my reply can help you. Best wishes! :-)
Your work is great and I have been following your progress regarding model compression. I am reproducing your work and have the following problem with data issues:
How does this data work, I finished downloading all_gen_132k.json as you said, but then you said generate data. I went to run generate_data again and it didn't run through, so I wanted to get your advice on how you use this data issue?