pistocop / pistoBot

Create an AI that chats like you
https://pistocop.github.io/pistoBot-website/
GNU General Public License v3.0
140 stars 22 forks source link

run_training.sh goes into Segmentation fault (core dumped) #2

Closed valkiii closed 4 years ago

valkiii commented 4 years ago

Hi, I am trying to rerun your code bot locally. I was able to parse my whatsapp messages reaching 120k messages. I replicated the structure of the colab folder and cloned this repo.

When I run !cp ./messaging-chat-parser/data/chat_parsed/all-messages.txt ./pistoBot/data/inputs/chat_parsed/all-messages-endoftext.txt !cd pistoBot/colab/ && bash run_training.sh gpt2-scratch I get the following output:

Installing common requirements... [gpt2-scratch model chosen] Installing requirements... Training model... 2020-10-28 22:16:02.797126: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 run_training.sh: line 37: 21006 Segmentation fault (core dumped) python ./pistoBot/03_gpt2_scratch/gpt2_scratch.py -v

Any suggestion?

pistocop commented 4 years ago

Hi @valkiii ,

Unfortunately, I haven't run the code locally, because I don't have a GPU good for ML tasks.

Anyway here's my 2 cents: I think your GPU hasn't enough memory to manage a GPT-2. The model is very huge and the colab GPU sometimes has a similar problem too (e.g. the pistoBot GPT-2 train batch is sized to avoid OOM).

Some things I think you could try to check if this assumption is true are: