keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.9k stars 19.45k forks source link

ResourceExhaustedError: OOM - When Doing an embedding using google's trasformer Architecture: #12051

Closed Aravinviju closed 5 years ago

Aravinviju commented 5 years ago

I am getting a ResourceExhaustedError: OOM - When Doing an embedding usinfg google's trasformer Architecture which embeds the text into a 512 dimensional vectors.

The data I'm trying to embed has 5000 records which adds up to 40MB of data. GPU used: Tesla k80 in a GCP instance. CPUs : 4 (15mb RAM) Tensorflow: tensorflow-gpu (3.0.1)

HERE IS THE CODE SNIPPET: with tf.Session() as session: session.run([tf.global_variables_initializer(), tf.tables_initializer()]) message_embeddings = session.run(embed(test_cleansed_data))

Here is the log:

ResourceExhaustedError Traceback (most recent call last) ~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, args) 1326 try: -> 1327 return fn(args) 1328 except errors.OpError as e:

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata) 1311 return self._call_tf_sessionrun( -> 1312 options, feed_dict, fetch_list, target_list, run_metadata) 1313

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata) 1419 self._session, options, feed_dict, fetch_list, target_list, -> 1420 status, run_metadata) 1421

~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg) 515 compat.as_text(c_api.TF_Message(self.status.status)), --> 516 c_api.TF_GetCode(self.status.status)) 517 # Delete the underlying status object from memory otherwise it stays alive

ResourceExhaustedError: OOM when allocating tensor with shape[4096000,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Softmax = SoftmaxT=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_2/self_attention/multihead_attention/q/Tensordot/Shape/_453 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1054_...rdot/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

ResourceExhaustedError Traceback (most recent call last)

in 1 with tf.Session() as session: 2 session.run([tf.global_variables_initializer(), tf.tables_initializer()]) ----> 3 message_embeddings = session.run(embed(test_cleansed_data)) ~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata) 903 try: 904 result = self._run(None, fetches, feed_dict, options_ptr, --> 905 run_metadata_ptr) 906 if run_metadata: 907 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr) ~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata) 1138 if final_fetches or final_targets or (handle and feed_dict_tensor): 1139 results = self._do_run(handle, final_targets, final_fetches, -> 1140 feed_dict_tensor, options, run_metadata) 1141 else: 1142 results = [] ~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata) 1319 if handle is None: 1320 return self._do_call(_run_fn, feeds, fetches, targets, options, -> 1321 run_metadata) 1322 else: 1323 return self._do_call(_prun_fn, handle, feeds, fetches) ~/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args) 1338 except KeyError: 1339 pass -> 1340 raise type(e)(node_def, op, message) 1341 1342 def _extend_graph(self): ResourceExhaustedError: OOM when allocating tensor with shape[4096000,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Softmax = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Reshape)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_2/self_attention/multihead_attention/q/Tensordot/Shape/_453 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1054_...rdot/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. Caused by op 'module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Softmax', defined at: File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in app.launch_new_instance() File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start self.io_loop.start() File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start self.asyncio_loop.run_forever() File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/asyncio/base_events.py", line 438, in run_forever self._run_once() File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/asyncio/base_events.py", line 1451, in _run_once handle._run() File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/asyncio/events.py", line 145, in _run self._callback(*self._args) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback ret = callback() File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper return fn(*args, **kwargs) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner self.run() File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run yielded = self.gen.send(value) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one yield gen.maybe_future(dispatch(*args)) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper yielded = next(result) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell yield gen.maybe_future(handler(stream, idents, msg)) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper yielded = next(result) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request user_expressions, allow_stdin, File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper yielded = next(result) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2819, in run_cell raw_cell, store_history, silent, shell_futures) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2845, in _run_cell return runner(coro) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner coro.send(None) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3020, in run_cell_async interactivity=interactivity, compiler=compiler, result=result) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3185, in run_ast_nodes if (yield from self.run_code(code, result)): File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 3, in message_embeddings = session.run(embed(test_cleansed_data)) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow_hub/module.py", line 247, in __call__ name=name) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow_hub/native_module.py", line 514, in create_apply_graph import_scope=relative_scope_name) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1927, in import_meta_graph **kwargs) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 741, in import_scoped_meta_graph producer_op_list=producer_op_list) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func return func(*args, **kwargs) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 577, in import_graph_def op_def=op_def) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/home/apoopat2/anaconda3/envs/cluster_gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096000,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Softmax = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_0/self_attention/multihead_attention/dot_product_attention/Reshape)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Node: module_apply_default_5/Encoder_en/Transformer/TransformerEncodeFast/encoder/layer_2/self_attention/multihead_attention/q/Tensordot/Shape/_453 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1054_...rdot/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. If someone can give a solution for this, that would be great! TA Arav
ParikhKadam commented 5 years ago

I have faced similar issues. The only thing you can do is create tensors that can fit in your memory. This error can solved by reducing text vector dimension. But that will lead towards low accuracy of your model. You can try reducing batch size which will not affect the model much and will throw this error.

For more help, you need to share your code/model architecture so that one can understand how much memory you are actually allocating to tensors and where we can reduce the memory usage.

Aravinviju commented 5 years ago

I have faced similar issues. The only thing you can do is create tensors that can fit in your memory. This error can solved by reducing text vector dimension. But that will lead towards low accuracy of your model. You can try reducing batch size which will not affect the model much and will throw this error.

For more help, you need to share your code/model architecture so that one can understand how much memory you are actually allocating to tensors and where we can reduce the memory usage.

It is not actually a model built by me, it is a pre-existing text embedding model called Transformer Architecture from google, (for more info : https://www.learnopencv.com/universal-sentence-encoder/?ck_subscriber_id=272164240). Basically I use this embeding technique to get the vectors and then use them for clustering. Anyway I'll try changing the batch and get back.

Thanks Arav

ParikhKadam commented 5 years ago

@Aravinviju I looked at the model. Basically, it is the model which can generate word embeddings given a text as input. Once, I too needed to use word and character embeddings as a part of my model. But I came to know that I would require very high specs PC to do this task which wasn't possible for me.

I too wanted to develop such model in order to learn how it works but that wasn't possible. So, I used pretrained word and character embeddings and then passsed them to a Bidirectional LSTM so that they learn contextual information based on the problem set.

Gensim is most popular library for this purpose. Even though, I used pymagnitude in my model. Both are very easy to use and can help you out.

BTW, reducing the batch size will definitely work but it still depends on your specs. Use a binary search like method to find a range of batch size that will work on your device.

Thank you.. Update me and I will be here for help.

Aravinviju commented 5 years ago

@ParikhKadam

Thanks a lot for your reply!

Yes indeed, but this particular embedding is better than the basic one's. I have tried the pre-trained embedings both Gensim (for machine learning) and also a word embedding model (https://www.cs.york.ac.uk/nlp/extvec/) from google (for a CNN classification model) in the initial phase. The current use-case i'm working on requires it to understand the meaning or irregular text for which I needed Sentence and paragraph embedding - Which is actually done by DAN and TA from google (same link provided before ), thus makes the model more understanding and gives better results too.

The point you suggested - reducing the batch size, in this process, its not exactly a model training I'm doing so I wasn't sure giving a value for the batch size in embedding. But still I split my dataset and sent the data for embedding in batches and getting the embedding results as lists and then finally joining the embedding results as on large file.

So, your point of reducing the batch size worked!

In terms of the device, as I stated in the details of the question, its a GCP instance with a Teslak80 GPU for which even 1GB of text data is easy enough to process. I was just processing 100MB of data at a time, but since its converting it to 512 dimension vectors relatively the batch size is also more it couldn't handle it.

Thanks for you help @ParikhKadam Will get back if I need anything else! For now I'll close this issue!

Cheers Arav

ParikhKadam commented 5 years ago

@Aravinviju Welcome.. Happy to help.

msymp commented 5 years ago

@ParikhKadam , Thank you for resolving this query. This issue is closed.