Open KJGithub2021 opened 2 years ago
Just for information, I have already tried the following but all in vain:
config = tf.compat.v1.ConfigProto() config.gpu_options.allow_growth = True sess = tf.compat.v1.Session(config=config)
Unfortunately, changing Python in Colab is not supported. Can you try using the existing Python (3.7) and tensorflow (2.8.2)?
ok let me try and get back.
Thanks for the help. It worked and no errors so far. However the model training time on Collab GPU runtime is even more than my local system (CPU enabled) ....why is it so?
so after 10 mins of execution, apparently the GPU device went out of RAM and the code terminated....with no error message
I am trying to execute my deep learning code, having the following link to the gist file: https://colab.research.google.com/gist/KJGithub2021/705a27f8d42bf1f26b07b8e10eedacb0/imn.ipynb
-Python 3.6.0 -Tensorflow 2.6
The code executes without any error on my local machine with CPU only device, although it takes hours in training. However, When I select GPU as the runtime in Collaboratory and run it, it gives the following errors:
=========== 2022-09-21 12:42:37.434840: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 164634400 exceeds 10% of free system memory. 2022-09-21 12:42:37.507974: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 164634400 exceeds 10% of free system memory. 2022-09-21 12:42:37.789797: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 164634400 exceeds 10% of free system memory. 2022-09-21 12:42:37.868800: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 164634400 exceeds 10% of free system memory. 2022-09-21 12:42:40.532872: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.538008: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.541819: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.545662: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.549502: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.552936: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.556449: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.559686: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.563026: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.566285: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.569632: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2022-09-21 12:42:40.572676: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1360, in _run_fn target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node CNN_char_emb/conv1d}}]] [[prediction_layer/add_2/_1705]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node CNN_char_emb/conv1d}}]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "train.py", line 262, in
train_step(x_utterances, x_response, x_utterances_len, x_response_len, x_utters_num, x_target, x_target_weight, id_pairs, x_u_char, x_u_char_len, x_r_char, x_r_char_len)
File "train.py", line 204, in train_step
feed_dict)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 968, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1369, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node CNN_char_emb/conv1d (defined at /content/drive/MyDrive/IMN-master/Ubuntu_V2/model/model_IMN.py:98) ]]
[[prediction_layer/add_2/_1705]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node CNN_char_emb/conv1d (defined at /content/drive/MyDrive/IMN-master/Ubuntu_V2/model/model_IMN.py:98) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'CNN_char_emb/conv1d':
File "train.py", line 128, in
l2_reg_lambda=FLAGS.l2_reg_lambda)
File "/content/drive/MyDrive/IMN-master/Ubuntu_V2/model/model_IMN.py", line 200, in init
utterances_cnn_char_emb = cnn_layer(utterances_char_embedded, filter_sizes=[3, 4, 5], num_filters=50, scope="CNN_char_emb", scope_reuse=False) # [batch_sizemax_utter_nummax_utter_len, emb]
File "/content/drive/MyDrive/IMN-master/Ubuntu_V2/model/model_IMN.py", line 98, in cnn_layer
conv = tf.nn.conv1d(inputs, w, stride=1, padding="VALID") # [num_words, num_chars - filter_size, num_filters]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
return target(*args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 2094, in conv1d_v2
dilations=dilations)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
return target(*args, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 617, in new_func
return func(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 617, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 2011, in conv1d
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 973, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3569, in _create_op_internal
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2045, in init
self._traceback = tf_stack.extract_stack_for_node(self._c_op)
============ I'm currently using chrome to run my colab files (ipynb).
I am looking forward to an early response from Colab team!