Closed b7leung closed 3 years ago
Hi, it is possible to add some parameters to use fp16 instead of fp32 which saved me enough memory to train the inverse paraphraser model on a 16GB P100 on Colab Pro. Try adding --fp16
and --fp16_opt_level "O3"
to the above.
You will need to install Apex Amp, which I found was best retrieved using !git clone https://github.com/NVIDIA/apex , check the readme and docs at : https://nvidia.github.io/apex/amp.html, everything is implemented already in Kalpesh's code.
Good luck, it would be cool to compare notes as I am also currently training my inverse paraphraser models.
When I run the paraphrase_many.py get the cuda out of memory error, I am not sure what parameter should to adjust?
@JonOnEarth did you try reducing the batch size using the --batch_size parameter? I haven't tried this but seems like a good starting point as the default is 64.
If batch size 1 doesn't fit, you should try a smaller model like gpt2-medium (it's not too much worse). Gradient checkpointing is also an option, but will need more work. We trained all our models on a 24 GB GPU
I'm trying to train an inverse paraphraser on my own custom dataset (I already followed these data preprocessing steps). My command is below; distributed training has been turned off. Even with a batch size of only 1, I still run out of memory on a GTX 1080TI (~11 GB). Is this expected, and are 2+ gpus are simply required? Or did I get something wrong? Is there anything else I can do make training work on 1 GPU?