Closed valkiii closed 4 years ago
Hi @valkiii ,
Unfortunately, I haven't run the code locally, because I don't have a GPU good for ML tasks.
Anyway here's my 2 cents: I think your GPU hasn't enough memory to manage a GPT-2. The model is very huge and the colab GPU sometimes has a similar problem too (e.g. the pistoBot GPT-2 train batch is sized to avoid OOM).
Some things I think you could try to check if this assumption is true are:
nvidia-smi
if your GPU memory is fully allocated before receive the error message
Hi, I am trying to rerun your code bot locally. I was able to parse my whatsapp messages reaching 120k messages. I replicated the structure of the colab folder and cloned this repo.
When I run
!cp ./messaging-chat-parser/data/chat_parsed/all-messages.txt ./pistoBot/data/inputs/chat_parsed/all-messages-endoftext.txt !cd pistoBot/colab/ && bash run_training.sh gpt2-scratch
I get the following output:Installing common requirements... [gpt2-scratch model chosen] Installing requirements... Training model... 2020-10-28 22:16:02.797126: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 run_training.sh: line 37: 21006 Segmentation fault (core dumped) python ./pistoBot/03_gpt2_scratch/gpt2_scratch.py -v
Any suggestion?