Closed bigmw closed 6 months ago
thanks for the report. I can confirm TF2 is not working on some GPU types.
b/144378214
Got same issue starting from today. Updating to tf-nightly-gpu helps in my case
thanks @jsnydes @shkarupa-alex
Someone suggests that this is due to the recent update from CUDA 10.0 to 10.1 at Google Colab. https://stackoverflow.com/questions/58926378/unable-to-run-even-basic-example-with-tf-2-0-and-gpu-support-in-colab
TF 2.0 is not yet ready for the update. installing tf-nightly-gpu did help as others and I have tested. But my results suggested that GPU under tf-nightly-gpu (TF 2.1.0-dev20191119) is ~54% slower than under the original TF 2.0 (before this bug), details in the original report thread: https://github.com/tensorflow/tensorflow/issues/34385 I think a potential temporary solution to all of this is for the Google Colab to reverse to CUDA 10.0, and hold the update on CUDA (if this is the cause) till TF 2.0 is fully ready. Hope this make sense.
Same issue on cupy. It can't load libXXX.10.0 libraries. FYI.
Have the same issue. Colab worked fine with TF 2.x-GPU but starting from yesterday it is not wroking anymore
b/144844033
@fortharrow Can you share a self-contained notebook reproducing the issue you observe with cupy?
We aren't able to reproduce the failure described with the bundled version, 6.5.0.
I have the same issue. Colab stopped working with the GPU two days ago. Was running tensorflow 2.0.0-beta1.
If you are using tensorflow 2.0 on Colab, we recommend using the bundled version, which you can enable with:
%tensorflow_version 2.x
This will switch your tensorflow version to the current 2.X version built for Colab (see https://colab.research.google.com/notebooks/tensorflow_version.ipynb for details)
If you install external releases of tensorflow via pip install tensorflow-gpu==2.0
or similar, it will install a pre-built binary that may not be compatible with the GPUs and drivers available in Colab's runtime.
@fortharrow Can you share a self-contained notebook reproducing the issue you observe with cupy?
We aren't able to reproduce the failure described with the bundled version, 6.5.0.
The situation is fixed now. Thanks you.
I've switched to the bundled version of tensorflow 2.0 and it's working again now. Thank you.
@adam-adam Yes, it works now. thanks for the update.
@jakevdp thanks for the comments. "%tensorflow_version 2.x" was used by me as shown in the test code above. And this was exactly the setting/version, where GPU device was not found in Colab. That's why we needed to install the tf-nightly-gpu version. Hope this makes sense.
If you are using tensorflow 2.0 on Colab, we recommend using the bundled version, which you can enable with:
%tensorflow_version 2.x
If I am Loading Tensorflow 2.0 using bundle Package which give me below error:
Traceback (most recent call last): File "keras_retinanet/bin/train.py", line 527, in <module> main() File "keras_retinanet/bin/train.py", line 522, in main initial_epoch=args.initial_epoch File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1732, in fit_generator initial_epoch=initial_epoch) File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 100, in fit_generator callbacks.set_model(callback_model) File "/usr/local/lib/python3.6/dist-packages/keras/callbacks/callbacks.py", line 68, in set_model callback.set_model(model) File "/usr/local/lib/python3.6/dist-packages/keras/callbacks/tensorboard_v2.py", line 116, in set_model super(TensorBoard, self).set_model(model) File "/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/callbacks.py", line 1532, in set_model self.log_dir, self.model._get_distribution_strategy()) # pylint: disable=protected-access AttributeError: 'Model' object has no attribute '_get_distribution_strategy'
Is this issue solved now ?
is 2.2.0-rc3
the right version ?
My code worked well with GPU in Colab yesterday. But this morning it became very slow. So I suspect that CPU is used despite hardware accelerator is set to GPU in “change runtime type” explicitly. The following test code result in “SystemError: GPU device not found”.
code chunk:
the same problem has be replicated by other users when I reported the issue to TF team: https://github.com/tensorflow/tensorflow/issues/34385. I posted it here because I think Colab can solve the problem by simply reverse the most recent update to TF 2.0 in their end. thank you.