senarvi / theanolm

TheanoLM is a recurrent neural network language modeling tool implemented using Theano
Apache License 2.0
81 stars 29 forks source link

cuda out of memory exception #34

Closed amrmalkhatib closed 6 years ago

amrmalkhatib commented 6 years ago

I have a kind of weird error (cuda out of memory exception). Although batch_size = 16, the mentioned exception appeared. Is there any solution or workaround for this error??

And here is the terminal output in log-level=debug:

Mapped name None to device cuda0: Tesla K80 (0000:00:04.0) Context None device="Tesla K80" ID="0000:00:04.0" Constructing vocabulary from training set. Number of words in vocabulary: 924547 Number of words in shortlist: 924547 Number of word classes: 924547 2017-11-06 20:06:36,637 train: TRAINING OPTIONS 2017-11-06 20:06:36,637 train: max_epochs: 100 2017-11-06 20:06:36,637 train: patience: 4 2017-11-06 20:06:36,637 train: max_annealing_count: 0 2017-11-06 20:06:36,637 train: stopping_criterion: annealing-count 2017-11-06 20:06:36,637 train: batch_size: 16 2017-11-06 20:06:36,638 train: validation_frequency: 5 2017-11-06 20:06:36,638 train: min_epochs: 1 2017-11-06 20:06:36,638 train: sequence_length: 100 2017-11-06 20:06:36,638 train: OPTIMIZATION OPTIONS 2017-11-06 20:06:36,638 train: method: adagrad 2017-11-06 20:06:36,638 train: momentum: 0.9 2017-11-06 20:06:36,638 train: learning_rate: 0.1 2017-11-06 20:06:36,638 train: weights: [ 1.] 2017-11-06 20:06:36,638 train: epsilon: 1e-06 2017-11-06 20:06:36,638 train: noise_sharing: None 2017-11-06 20:06:36,638 train: num_noise_samples: 5 2017-11-06 20:06:36,638 train: max_gradient_norm: 5 2017-11-06 20:06:36,639 train: gradient_decay_rate: 0.9 2017-11-06 20:06:36,639 train: sqr_gradient_decay_rate: 0.999 Creating trainer. Computing the number of mini-batches in training data. 2017-11-06 20:30:17,912 init: One epoch of training data contains 3715163 mini-batch updates. 2017-11-06 20:30:17,979 init: Class unigram log probabilities are in the range [-inf, -3.834355]. 2017-11-06 20:30:17,980 init: Finding sentence start positions in /home/ahmed_alaa/training.txt. 2017-11-06 20:31:19,799 _reset: Generating a random order of input lines. Building neural network. 2017-11-06 20:32:20,906 init: Creating layers. 2017-11-06 20:32:20,906 init: - NetworkInput name=class_input inputs=[] size=924547 activation=tanh devices=[] 2017-11-06 20:32:20,906 init: - ProjectionLayer name=projection_layer inputs=[class_input] size=100 activation=tanh devices=[None] 2017-11-06 20:32:27,632 add: layers/projection_layer/W size=92454700 type=float32 device=None 2017-11-06 20:32:27,633 init: - LSTMLayer name=hidden_layer_1 inputs=[projection_layer] size=300 activation=tanh devices=[None] 2017-11-06 20:32:27,642 add: layers/hidden_layer_1/layer_input/W size=120000 type=float32 device=None 2017-11-06 20:32:28,216 add: layers/hidden_layer_1/step_input/W size=360000 type=float32 device=None 2017-11-06 20:32:28,216 add: layers/hidden_layer_1/layer_input/b size=1200 type=float32 device=None 2017-11-06 20:32:28,218 init: - FullyConnectedLayer name=hidden_layer_2 inputs=[hidden_layer_1] size=300 activation=tanh devices=[None] 2017-11-06 20:32:28,292 add: layers/hidden_layer_2/input/W size=90000 type=float32 device=None 2017-11-06 20:32:28,292 add: layers/hidden_layer_2/input/b size=300 type=float32 device=None 2017-11-06 20:32:28,292 init: - SoftmaxLayer name=output_layer inputs=[hidden_layer_2] size=924547 activation=tanh devices=[None] 2017-11-06 20:32:40,247 add: layers/output_layer/input/W size=277364100 type=float32 device=None 2017-11-06 20:32:40,263 add: layers/output_layer/input/b size=924547 type=float32 device=None 2017-11-06 20:32:40,263 init: Total number of model parameters: 371314847 Building optimizer. 2017-11-06 20:32:48,439 add: layers/output_layer/input/b_sum_sqr_gradient size=924547 type=float32 device=None 2017-11-06 20:32:49,323 add: layers/output_layer/input/W_sum_sqr_gradient size=277364100 type=float32 device=None 2017-11-06 20:32:49,326 add: layers/hidden_layer_2/input/W_sum_sqr_gradient size=90000 type=float32 device=None 2017-11-06 20:32:49,326 add: layers/hidden_layer_1/layer_input/b_sum_sqr_gradient size=1200 type=float32 device=None 2017-11-06 20:32:49,326 add: layers/hidden_layer_2/input/b_sum_sqr_gradient size=300 type=float32 device=None 2017-11-06 20:32:49,327 add: layers/hidden_layer_1/layer_input/W_sum_sqr_gradient size=120000 type=float32 device=None 2017-11-06 20:32:49,621 add: layers/projection_layer/W_sum_sqr_gradient size=92454700 type=float32 device=None 2017-11-06 20:32:49,623 add: layers/hidden_layer_1/step_input/W_sum_sqr_gradient size=360000 type=float32 device=None

Building text scorer for cross-validation. Validation text: /home/ahmed_alaa/testing.txt Training neural network. Theano returned a gpuarray error "b'cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory' Apply node that caused the error: GpuSoftmax(GpuReshape{2}.0) Toposort index: 229 Inputs types: [GpuArrayType(float32, matrix)] Inputs shapes: [(1472, 924547)] Inputs strides: [(3698188, 4)] Inputs values: ['not shown'] Outputs clients: [[GpuAdvancedSubtensor(GpuSoftmax.0, ARange{dtype='int64'}.0, HostFromGpu(gpuarray).0), HostFromGpu(gpuarray)(GpuSoftmax.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer): File "/usr/local/bin/theanolm", line 4, in import('pkg_resources').run_script('TheanoLM==1.3.0', 'theanolm') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 719, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1504, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 159, in main() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 105, in main args.command_function(args) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/commands/train.py", line 395, in train profile=args.profile) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/network.py", line 249, in init layer.create_structure() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/softmaxlayer.py", line 80, in create_structure output_probs = tensor.nnet.softmax(preact) File "/usr/local/bin/theanolm", line 4, in import('pkg_resources').run_script('TheanoLM==1.3.0', 'theanolm') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 719, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1504, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 159, in main() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 105, in main args.command_function(args) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/commands/train.py", line 395, in train profile=args.profile) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/network.py", line 249, in init layer.create_structure() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/softmaxlayer.py", line 80, in create_structure output_probs = tensor.nnet.softmax(preact)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.". Traceback will be written to debug log. 2017-11-06 20:33:12,079 main: b'cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory' Apply node that caused the error: GpuSoftmax(GpuReshape{2}.0) Toposort index: 229 Inputs types: [GpuArrayType(float32, matrix)] Inputs shapes: [(1472, 924547)] Inputs strides: [(3698188, 4)] Inputs values: ['not shown'] Outputs clients: [[GpuAdvancedSubtensor(GpuSoftmax.0, ARange{dtype='int64'}.0, HostFromGpu(gpuarray).0), HostFromGpu(gpuarray)(GpuSoftmax.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer): File "/usr/local/bin/theanolm", line 4, in import('pkg_resources').run_script('TheanoLM==1.3.0', 'theanolm') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 719, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1504, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 159, in main() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 105, in main args.command_function(args) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/commands/train.py", line 395, in train profile=args.profile) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/network.py", line 249, in init layer.create_structure() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/softmaxlayer.py", line 80, in create_structure output_probs = tensor.nnet.softmax(preact) File "/usr/local/bin/theanolm", line 4, in import('pkg_resources').run_script('TheanoLM==1.3.0', 'theanolm') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 719, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1504, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 159, in main() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 105, in main args.command_function(args) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/commands/train.py", line 395, in train profile=args.profile) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/network.py", line 249, in init layer.create_structure() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/softmaxlayer.py", line 80, in create_structure output_probs = tensor.nnet.softmax(preact)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node. Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/theano/compile/function_module.py", line 903, in call self.fn() if output_subset is None else\ File "pygpu/gpuarray.pyx", line 693, in pygpu.gpuarray.pygpu_empty File "pygpu/gpuarray.pyx", line 301, in pygpu.gpuarray.array_empty pygpu.gpuarray.GpuArrayException: b'cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 105, in main args.command_function(args) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/commands/train.py", line 457, in train trainer.train() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/training/trainer.py", line 221, in train self._optimizer.update_minibatch(word_ids, class_ids, file_ids, mask) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/training/basicoptimizer.py", line 200, in update_minibatch self.update_function(word_ids, class_ids, mask, weights, alpha) File "/usr/local/lib/python3.5/dist-packages/theano/compile/function_module.py", line 917, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/usr/local/lib/python3.5/dist-packages/theano/gof/link.py", line 325, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/usr/lib/python3/dist-packages/six.py", line 685, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.5/dist-packages/theano/compile/function_module.py", line 903, in call self.fn() if output_subset is None else\ File "pygpu/gpuarray.pyx", line 693, in pygpu.gpuarray.pygpu_empty File "pygpu/gpuarray.pyx", line 301, in pygpu.gpuarray.array_empty pygpu.gpuarray.GpuArrayException: b'cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory' Apply node that caused the error: GpuSoftmax(GpuReshape{2}.0) Toposort index: 229 Inputs types: [GpuArrayType(float32, matrix)] Inputs shapes: [(1472, 924547)] Inputs strides: [(3698188, 4)] Inputs values: ['not shown'] Outputs clients: [[GpuAdvancedSubtensor(GpuSoftmax.0, ARange{dtype='int64'}.0, HostFromGpu(gpuarray).0), HostFromGpu(gpuarray)(GpuSoftmax.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer): File "/usr/local/bin/theanolm", line 4, in import('pkg_resources').run_script('TheanoLM==1.3.0', 'theanolm') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 719, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1504, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 159, in main() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 105, in main args.command_function(args) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/commands/train.py", line 395, in train profile=args.profile) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/network.py", line 249, in init layer.create_structure() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/softmaxlayer.py", line 80, in create_structure output_probs = tensor.nnet.softmax(preact) File "/usr/local/bin/theanolm", line 4, in import('pkg_resources').run_script('TheanoLM==1.3.0', 'theanolm') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 719, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1504, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 159, in main() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/EGG-INFO/scripts/theanolm", line 105, in main args.command_function(args) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/commands/train.py", line 395, in train profile=args.profile) File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/network.py", line 249, in init layer.create_structure() File "/usr/local/lib/python3.5/dist-packages/TheanoLM-1.3.0-py3.5.egg/theanolm/network/softmaxlayer.py", line 80, in create_structure output_probs = tensor.nnet.softmax(preact)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

senarvi commented 6 years ago

Batch size is only one thing that affects memory consumption. Your maximum sequence length is 100, so your mini-batches can be at most 100 16 words. You can probably limit to shorter sequences without much effect on the performance. Another thing that has a large effect on the memory requirements is the size of the model. You have a vocabulary of 1 million words. If the layer that you have before the output layer has 1000 units, the output layer weight matrix is 1 million 1000 * 4 bytes = 4 GB. It's the same thing in the input layer. I recommend making either of those layers smaller. You can make the input and output layer smaller by using word classes or a shortlist.