Closed stekiri closed 4 years ago
I could replicate the issue with Tf 2.0 on colab. Please find the gist here. Thanks!
@stekiri Can you please try with TF2.1
and tf-nightly
. When I ran it in colab, I am seeing different warning as follows. Thanks!
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
WARNING:tensorflow:NCCL is not supported when using virtual GPUs, fallingback to reduction to one device
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2')
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
With TF 2.1 the warning I mentioned does not appear for above script, however, if I use four virtual GPUs (in the serving script) instead of three, the warning is displayed again.
You can reproduce it with the following virtual device config:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512)]
)
@stekiri I cannot reproduce that warning. When I modify as you mentioned above, i get RuntimeError
as follows. Please check the gist here. Thanks!
RuntimeError Traceback (most recent call last)
in () 9 tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512), 10 tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512), ---> 11 tf.config.experimental.VirtualDeviceConfiguration(memory_limit=512)] 12 ) 13 1 frames /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/context.py in set_logical_device_configuration(self, dev, virtual_devices) 1300 if self._context_handle is not None: 1301 raise RuntimeError( -> 1302 "Virtual devices cannot be modified after being initialized") 1303 1304 self._virtual_device_map[dev] = virtual_devices RuntimeError: Virtual devices cannot be modified after being initialized
You would need to restart the runtime in between running the save and load script as it's not possible to change virtual devices once they have been initialized.
@stekiri I ran first part of your code and restarted runtime and ran second part of your code. I cannot reproduce the issue when I used recent tf-nightly
. Please check the gist here.
Can you please check once and close the issue if this was resolved for you? Thanks!
It seems to be resolved. Thanks!
@GF-Huang Can you please open a new issue with a standalone code to reproduce the issue? thanks!
@GF-Huang Can you please open a new issue with a standalone code to reproduce the issue? thanks!
I am experiencing the same error in tensorflow version 2.9.1. I just now kept receiving the error message after removing the arguments: activation='relu' and dropout=0.2. Does this affect the model somehow or can the error message be ignored?
System information
Describe the current behavior When the model is saved in the default tf format, warnings are logged when trying to serve the model.
Examplary warning logs:
When the model is saved in the hdf5 format, the warnings do not occur.
Describe the expected behavior The save formats should be equivalent and behave in the same way.
Code to reproduce the issue Execute the following scripts to create and serve model
format_ext = ''
which saves the model in tf format, restart the Python console, serve the model with the second script which creates the aforementioned warnings.format_ext = '.h5'
, the model is saved in hdf5 format and no warnings appear.Model creation:
Model serving:
Other info / logs The warnings occur only if more than two vGPUs are used.