udibr / headlines

Automatically generate headlines to short articles
MIT License
525 stars 150 forks source link

Running OOM in cell 30 when using GPU #12

Closed xtr33me closed 8 years ago

xtr33me commented 8 years ago

First off, thanks for this share. This NLP stuff is really cool, but can be overwhelming. This project has helped immensely in helping me to better understand. Unfortunately, it seems troubleshooting these models and implementations is almost an art in itself.

So I am currently using Tensorflow and I wrote a scraper to pull data from Buzzfeed to act as my training set since the reuters data didn't seem to be enough due to glove containing 40k vocab. When I was running against the CPU, all worked fine but in a period of about 7 hours, I only made it through about 4 iterations out of 500. So I have attempted to jump over to using the GPU. I am running on a Mac with an Nvidia 650M and 1 GB of VRAM. So I am aware that this isn't the best of hardware, but I believe it still should be doable right?

So when I run the train.py file (converted from ipynb), I am getting the below OOM error. I know you have been using Theano, so if you aren't sure then just disregard. However if you know how I may be able to overcome this issue I'd love to hear it. It seems to be erroring out around cell 30. What is baffling to me is that it says total memory is 1023.MiB but free is 49.Mib. As I run each time it reduces. Now I saw in tF forums that Tensorflow allocates all GPU and that you really can't tell how much is free because it is managed internally....so maybe this is nothing. I was trying to figure out how to flush the GPU mem but I haven't had any luck with that either. Even after restarting the mac, I still see about the same thing.

I have tried adjusting the training sample size and some of the other variables to see if I could get it to just run through even once but it still ends up crashing with an OOM error. My next thing I am going to try is getting the TF summarizer working so perhaps I might be able to get a bit more insight via tensorboard. I do have a much better GPU on my PC, but I will need to set this all up in a docker vm or something if I take that approach. However if it works, I will be happy.

If you have any insight, please send it my way. Thanks again!!

`Using TensorFlow backend. I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.dylib locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.dylib locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.dylib locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.dylib locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.dylib locally 1.0.7 number of examples 49372 49372 dimension of embedding space for words 100 vocabulary size 40000 the last 10 words can be used as place holders for unknown/oov words total number of different words 74477 74477 number of words outside vocabulary which we can substitue using glove similarity 12523 number of words that will be regarded as unknonw(unk)/out-of-vocabulary(oov) 21954 46372 46372 3000 3000 H: oops building D: there’s something different about this building can you guess what H: mathematical formula for beer goggles D: british scientists discover the exact equation so-called beer goggles I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:883] OS X does not support NUMA - returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GT 650M major: 3 minor: 0 memoryClockRate (GHz) 0.9 pciBusID 0000:01:00.0 Total memory: 1023.69MiB Free memory: 49.59MiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0) I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.

.....

I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x703a04a00 of size 1048576 I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x703b04a00 of size 1048576 I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x700b8dc00 of size 149504 I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x700de4400 of size 204800 I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x703c04a00 of size 1123840 I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size: I tensorflow/core/common_runtime/bfc_allocator.cc:692] 28 Chunks of size 256 totalling 7.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 24 Chunks of size 2048 totalling 48.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 204800 totalling 800.0KiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 31 Chunks of size 1048576 totalling 31.00MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1139200 totalling 1.09MiB I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 16000000 totalling 15.26MiB I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 48.18MiB I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: Limit: 51998720 InUse: 50520576 MaxInUse: 50520576 NumAllocs: 129 MaxAllocSize: 16000000

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ** W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 144.04MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:907] Resource exhausted: OOM when allocating tensor with shape[944,40000] Traceback (most recent call last): File "train.py", line 275, in name = 'timedistributed_1'))) File "/usr/local/lib/python2.7/site-packages/keras/models.py", line 307, in add output_tensor = layer(self.outputs[0]) File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 484, in call self.build(input_shapes[0]) File "/usr/local/lib/python2.7/site-packages/keras/layers/wrappers.py", line 102, in build self.layer.build(child_input_shape) File "/usr/local/lib/python2.7/site-packages/keras/layers/core.py", line 604, in build name='{}_W'.format(self.name)) File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 59, in glorot_uniform return uniform(shape, s, name=name) File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 32, in uniform return K.random_uniform_variable(shape, -scale, scale, name=name) File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 248, in random_uniform_variable return variable(value, dtype=dtype, name=name) File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 132, in variable get_session().run(v.initializer) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 343, in run run_metadata_ptr) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 567, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 640, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 662, in _do_call e.code) tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[944,40000] [[Node: random_uniform_13/RandomUniform = RandomUniformT=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/gpu:0"]] Caused by op u'random_uniform_13/RandomUniform', defined at: File "train.py", line 275, in name = 'timedistributed_1'))) File "/usr/local/lib/python2.7/site-packages/keras/models.py", line 307, in add output_tensor = layer(self.outputs[0]) File "/usr/local/lib/python2.7/site-packages/keras/engine/topology.py", line 484, in call__ self.build(input_shapes[0]) File "/usr/local/lib/python2.7/site-packages/keras/layers/wrappers.py", line 102, in build self.layer.build(child_input_shape) File "/usr/local/lib/python2.7/site-packages/keras/layers/core.py", line 604, in build name='{}_W'.format(self.name)) File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 59, in glorot_uniform return uniform(shape, s, name=name) File "/usr/local/lib/python2.7/site-packages/keras/initializations.py", line 32, in uniform return K.random_uniform_variable(shape, -scale, scale, name=name) File "/usr/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 247, in random_uniform_variable value = tf.random_uniform_initializer(low, high, dtype=tf_dtype)(shape) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/init_ops.py", line 98, in _initializer return random_ops.random_uniform(shape, minval, maxval, dtype, seed=seed) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/random_ops.py", line 182, in random_uniform seed2=seed2) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_random_ops.py", line 96, in _random_uniform seed=seed, seed2=seed2, name=name) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 694, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in init self._traceback = _extract_stack() `

udibr commented 8 years ago

500 epochs is just an arbitrary number I kill the process much earlier than that.

Running against GPU memory problems is very common. The main differences between GPU cards is their memory size which is sometimes more important than speed. The Mac GPU is too small and some of its memory is used by OS X to generate your screen. Try running on AWS with g2.x2large machine.

You can reduce batch_size until it fits GPU memory. You can also try smaller model size (less rnn layers and nodes) and see what happens. The model parameters I used are also somewhat arbitrary.

xtr33me commented 8 years ago

Thanks for the help....yet again! Do you have a paypal account? If so, let me know the email...I'd like to buy a few drinks for ya to say thanks :)

udibr commented 8 years ago

🍺🍺 thanks