Hi, I was using the script for supervised training. My dataset is fairly large, and some checkpoints were made during the training. However, the training was not completed, and I was wondering how can I use the checkpoints to continue the training?
I tried setting the checkpoint as the model name, but it seems like the training will start from the begining. Here is the script I used:
Hi, for now the code only supports reloading the model weights but not the optimizer states. You may have to tune the learning rate and max_steps for continuing training.
Hi, I was using the script for supervised training. My dataset is fairly large, and some checkpoints were made during the training. However, the training was not completed, and I was wondering how can I use the checkpoints to continue the training?
I tried setting the checkpoint as the model name, but it seems like the training will start from the begining. Here is the script I used:
python3 run_train.py\ --output_dir=$OUTPUT_DIR \ --model_name_or_path=checkpoint-12000 \ --extraction 'softmax' \ --do_train \ --train_so \ --train_data_file=$TRAIN_FILE \ --train_gold_file=$TRAIN_GOLD_FILE \ --per_gpu_train_batch_size 2 \ --gradient_accumulation_steps 4 \ --num_train_epochs 5 \ --learning_rate 1e-4 \ --save_steps 2000