Open ioannist opened 4 years ago
Same problem with run_finetune_shakespeare_0.sh, btw, when training with gpt2 (small)
Thanks for reporting this! I will look more closely later in the day / tomorrow, but which HuggingFace transformers
library version are you using?
Should be transformers==3.4.0 as in the reqs file. I installed everything in a fresh conda env with python==3.7.
Btw, I am looking forward to the directions for training the inverse model on custom data!
I just tried running it with GPT2-small, and I can see the config.json
files. Could you share the set of files you see in your checkpoint folder?
I had the same issue when training my models. It seems like there is an issue with the path in this line. Basically, when re-loading the model, the args.output_dir
is used instead of the output_dir
that is defined a few lines above. So this points to the parent folder of all the checkpoints instead of the folder with the last checkpoint.
I haven't tested if this fixes the problem, but I will try it for my next run on the cluster.
Just to follow up: Changing the line mentioned above did fix the error. Just make sure that --do_eval
is set and that you are not using do_delete_old
. This way the best, i.e. lowest validation perplexity, checkpoint will be copied to the output dir / parent folder of all the checkpoints after training is finished.
@martiansideofthemoon Just curious, how could I also load gpt2-small
as you did? It seems that this is not offered in the HuggingFace model hub.
@guanqun-yang you can just use gpt2
offered on HuggingFace (https://huggingface.co/gpt2)
I tried training the paraphraser with gpt2 (small) as the large model would not fit my 1080 Ti. Everything went alright until the last iteration, where I got the error below. The final checkpoint seems to have been saved successfully. However, python tries to read from
file style_paraphrase/saved_models/test_paraphrase/config.json
which was not created and does not exist. All config.json files are inside their respective checkpoint folders.