Closed abhisha1991 closed 1 year ago
Hi @abhisha1991, Unfortunately I've not encountered the error before so I'm not 100% sure the following will work. But they are still worth a try ---
-m torch.distributed.launch --nproc_per_node=1
from the bash script. That way there will only be a single PyTorch process running the code. That way args.local_rank will be automatically set to -1
. If this gives you any error, let me knowSolution 2 is probably much lesser work so I suggest trying that first.
Hello @martiansideofthemoon I hope you are fine and doing great. I am facing another problem related to the run_finetune_paraphrase.sh. I am trying to run that in Google Colab and on execution it takes a few seconds and shows that CUDA GPU is out of memory. I also have experimented with changing batch size in the sh file but it didn't work. I would appreciate your help with that.
hi @TufailAhmadSiddiq , what's the smallest batch size you tried? Reducing batch size is ok since you can do gradient accumulation to have a larger effective batch size
Thanks for the reply. The minimum batch size I have used is 2 but still facing the same problem
This is with GPT2-large? As long as batch size 1 fits, it should be ok. You can also change GPT2-large to GPT2-medium, it doesn't drop performance much. Another solution could be gradient checkpointing
Can you please point out where I should make changes to work it perfectly?
Change this to gpt2-medium
: https://github.com/martiansideofthemoon/style-transfer-paraphrase/blob/master/style_paraphrase/examples/run_finetune_paraphrase.sh#L23
and reduce batch size here: https://github.com/martiansideofthemoon/style-transfer-paraphrase/blob/master/style_paraphrase/examples/run_finetune_paraphrase.sh#L32
Thanks for the guidance. I do it and check whether it works or not.
Hello! Hope you are doing good. I am trying to fine tune your model on my custom dataset. When I run !style_paraphrase/examples/run_finetune_paraphrase.sh
, I get the following error:
I followed first two steps of "Custom Datasets" in this repository. At third step, while converting BPE code to fairseq binaries, "Permission denied" occurs.
@HassanBinAli i think you are missing the dataset files in the repo. Please download the train.pickle
file from here and place it in datasets/paranmt_filtered/train.pickle
.
Thank You. It resolved the error.
Hey Kalpesh and team,
Thanks very much for releasing your work - it is great to see a simple architecture like this being implemented for something novel. We're trying to just get up and running with the base set up - we have downloaded all the data and corresponding models to the right folders. However, upon running the fine tune training, we get the attached error
Our setup is a cloud VM with 1 GPU core (Nvidia Tesla T4), Ubuntu 18.04, 7.5 GB RAM, pytorch 1.10, cuda 11.5 We have confirmed pytorch is installed and available along with CUDA on the machine (see attachments)
We'd be incredibly grateful if you could release a docker image with pre-installed dependencies or tell us the exact error mode we are facing below. We're unable to proceed past this error. We're also unable to locate the error logs here (~/style-transfer-paraphrase/style_paraphrase/logs) and thus unable to understand what is wrong with our setup