Running Out of Memory - Githubissues

sameerkhurana10 commented 8 years ago

Hi,

I was trying to train an LSTM language model, using the command:

theanolm train model1.h5 cued_rnnlm/example.AMI.clean/data/rnnlm_data/valid.txt --training-set cued_rnnlm/example.AMI.clean/data/rnnlm_data/train.txt --batch-size 64

This is the output that I get on the terminal:

Using gpu device 0: GeForce GTX TITAN Black Constructing vocabulary from training set. Number of words in vocabulary: 144387 Number of word classes: 144387 Building neural network. /data/sls/qcri-scratch/sameer/anaconda3/envs/test1/lib/python3.4/site-packages/theano/scan_module/scan.py:1017: Warning: In the strict mode, all neccessary shared variables must be passed as a part of non_sequences 'must be passed as a part of non_sequences', Warning) Building text scorer. Building neural network trainer. Training neural network. Error allocating 3696307200 bytes of device memory (out of memory). Driver report 1857921024 bytes free and 6441730048 bytes total Traceback (most recent call last): File "/data/sls/qcri-scratch/sameer/anaconda3/envs/test1/lib/python3.4/site-packages/theano/compile/function_module.py", line 595, in call outputs = self.fn() MemoryError: Error allocating 3696307200 bytes of device memory (out of memory).

Right now, I am training my language model using 4M tokens and a vocabulary of 100k. I don't think I would be able to scale up to 100M tokens and 400k vocabulary. Am I doing something wrong here? How do I make the training memory efficient? GPUs on my server have 6G of maximum memory. Is there a multiple GPU support? For now, I have reduced the batch size to 10. Hopefully it will work, but will take much longer time to train.

senarvi commented 8 years ago

The vocabulary size is crucial. It doesn't matter how much training data you have.

With 144k words in the vocabulary, the model becomes too large to fit in the GPU memory. Here are some things that you can try:

Limit the vocabulary to N most frequent words, by providing a vocabulary file. The rest of the words will be mapped to <unk>. Then you can either specify <unk> score using --unk-penalty or let TheanoLM estimate it.
Cluster the words into word classes. For example, I've used 2000 word classes created with mkcls. See the tutorial
Use CPU. But this is slow.

The latest version of Theano has support for multiple GPUs, but I haven't tried it out yet. It's just a matter of how you configure Theano.

About the batch size, I'm not sure if that has much effect on training time, as long as the matrices are big enough. Batch size of 10 means 10 sequences.

sameerkhurana10 commented 8 years ago

Okay, thanks. Using classes might be a better option, because Arabic Language has generally a big vocabulary.

sameerkhurana10 commented 8 years ago

Can I please ask, what was the vocabulary size for which you created 2000 classes? just to get an idea.

senarvi commented 8 years ago

I have created 2000 classes from a Finnish vocabulary of 2.5M words and from an English vocabulary of 500k words. I haven't experimented with different numbers yet, so that's probably not optimal, but just something where you can start.

Arabic is also agglutinative to some extent, right? You can also try using a subword vocabulary. Have you considered using Morfessor

sameerkhurana10 commented 8 years ago

i will consider that. Thanks

senarvi / theanolm

Running Out of Memory #12