jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".
MIT License
8.82k stars 1.98k forks source link

Memory Problem? #25

Closed renqianluo closed 4 years ago

renqianluo commented 7 years ago

Hi, I clone your code and run train it on WMT English-German task, but it failed with "RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502009910772/work/torch/lib/THC/generic/THCStorage.cu:66". I run it on a Tesla K40 which has the same memory capacity of 12GB as your Titan X, and with the default settings. So I don`t know why this happens, do you have any idea? Thanks

alvations commented 7 years ago

I'm getting the same error on a GTX 1080 with 8GB per card.

RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:66

Error happens after training the first epoch when validation:

~/attention-is-all-you-need-pytorch$ python3 train.py -data data/multi30k.atok.low.pt -save_model trained -save_mode best -proj_share_weight 
Namespace(batch_size=64, cuda=True, d_inner_hid=1024, d_k=64, d_model=512, d_v=64, d_word_vec=512, data='data/multi30k.atok.low.pt', dropout=0.1, embs_share_weight=False, epoch=10, log=None, max_token_seq_len=52, n_head=8, n_layers=6, n_warmup_steps=4000, no_cuda=False, proj_share_weight=True, save_mode='best', save_model='trained', src_vocab_size=2909, tgt_vocab_size=3149)
[ Epoch 0 ]
  - (Training)   ppl:  102.10204, accuracy: 30.757 %, elapse: 2.077 min                                                 
  - (Validation) :   0%|                                                                         | 0/16 [00:00<?, ?it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last): 
jadore801120 commented 7 years ago

Hi @renqianluo and @alvations, Thanks for the report! I also notice that the GPU memory usage is sometimes unacceptable huge. I think part of the problem may be caused by history from the unnecessary enc-dec layer interaction pointed out in #27. I will try to remove the incorrect layer interaction and observe the memory usage. Thanks!

jadore801120 commented 7 years ago

Hi @alvations , Thanks for pointing out the failing under the evaluation! I modified the Variable preparing part in DataLoader.py to avoid recording history under evaluation. I believe it will slightly mitigate the memory usage of evaluation.

Hi @renqianluo , As mentioned in #27, I implemented the model incorrectly by using all encode layer outputs and I think it may be part of the cause of the memory problem. With the modification 07e2cda7d6e3ed9225ae1063b90a077432c24995, the recording of history is heavily reduced and there is no need to keep the layer outputs as local variable. Under the default parameter settings, I run the model with ~8000MiB GPU memory usage. Please retry with the newest commit.

Thanks!

Yu-Hsiang

renqianluo commented 6 years ago

@jadore801120 Thanks for your response, I will try the new code

akshay1123 commented 6 years ago

I am still getting out of memory error on latest commit on tesla P100, error occurs while validation starts after training in Epoch 1. @jadore801120 can you help me why is this error occuring even after optimization? Im pasting traceback below.

JulianRMedina commented 6 years ago

A fix that I used was to decrease batch_size in the parser arguments thereby decreasing memory requirements. I have it working on a GTX 1070 with batch_size = 32.

DaoD commented 6 years ago

@akshay1123 I'm not sure if you have solved your problem. One hint for you, use torch.cuda.empty_cache() after every batch training. This may help, but I'm not sure. You can monitor the memory allocated and cached periodically by torch: torch.cuda.memory_allocated() torch.cuda.memory_cached()

BinWone commented 5 years ago

@akshay1123 I'm not sure if you have solved your problem. One hint for you, use torch.cuda.empty_cache() after every batch training. This may help, but I'm not sure. You can monitor the memory allocated and cached periodically by torch: torch.cuda.memory_allocated() torch.cuda.memory_cached()

use torch.cuda.empty_cache() still throw out of memory error, this didn't solve the problem. I can only use batch_size = 64 in 4 k40m GPUs, this train so slow, how can I use bigger batch_size? any suggestions? Thx!!!

gauravsc commented 5 years ago

Why does this pytorch implementation takes a lot more memory than any tensorflow implementation for transformer?

buptwhr commented 5 years ago

use torch.cuda.empty_cache() still throw out of memory error, this didn't solve the problem. I can only use batch_size = 64 in 4 k40m GPUs, this train so slow, how can I use bigger batch_size? any suggestions? Thx!!!

Have you solved this problem? That's so weird that only when my model is very small, there's no out of memory. But OOM happened again when the first epoch was trained 70%. @akshay1123

DaoD commented 5 years ago

@buptwhr I'm not sure since I haven't tried this project. Maybe you can use another project in the following link, which I have tested on my GPU, and there is no memory error. Hope this can help you. https://github.com/DaoD/annotated-transformer

jadore801120 commented 4 years ago

Dear all, Please take a look on the newest code. I try to reduce as many the contiguous operations as possible with broadcasting. Hope it helps. Thanks.