RuntimeError: CUDA out of memory.

DarlineFiedler commented 4 years ago

If I run:

python train.py -mode train -encoder classifier -dropout 0.1 -bert_data_path ../bert_data/test_data/test_data/test -model_path ../models/bert_classifier -lr 2e-3 -visible_gpus 0 -gpu_ranks 0 -world_size 1 -report_every 50 -save_checkpoint_steps 1000 -batch_size 3000 -decay_method noam -train_steps 50000 -accum_count 2 -log_file ../logs/bert_classifier -use_interval true -warmup_steps 10000

I get this Error:

RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 2.00 GiB total capacity; 1.19 GiB already allocated; 11.31 MiB free; 1.33 GiB reserved in total by PyTorch)

I know it somthing with the Memory. But I dont know how to solve it. If i reduce the batch_size so much that i don't get the out-of-memory error i get this error:

ValueError: max() arg is an empty sequence

And i don't know how to solve this problem.

DarlineFiedler commented 4 years ago

Unfortunately I have not yet solved the memory problem. But if I make the -world_size from 1 to 3 I get a completely different error. I get this error:

AttributeError: module 'signal' has no attribute 'SIGUSR1'

DarlineFiedler commented 4 years ago

I switched to google colab. There I could do everything without ot-of-memory.

nlpyang / BertSum

RuntimeError: CUDA out of memory. #101