Open Victor-99 opened 5 years ago
Same here, I hope this is why TF2 is not working (otherwise I have no idea why on my RTX 2060 super the script is not working while on other machines it is fine).
The script :
import math
import pickle
import os
import numpy as np
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
def load_data(path: str):
data = []
labels = []
for batch_index in range(1, 6):
with open(os.path.join(path, 'data_batch_{}'.format(batch_index)), 'rb') as data_file:
data_dict = pickle.load(data_file, encoding='bytes')
data.append(np.reshape(data_dict[b'data'], (len(data_dict[b'data']), 3, 32, 32)))
labels.append(data_dict[b'labels'])
return np.concatenate(data), np.concatenate(labels)
def main():
data_path = 'data/cifar-10-batches-py'
data, labels = load_data(data_path)
data = np.transpose(data, (0, 2, 3, 1))
max_epoch = 2
batch_size = 32
image_input = tf.keras.Input(shape=(32, 32, 3))
data_op = tf.cast(image_input, tf.float32) / 255.0
network_output = tf.keras.layers.Conv2D(8, 5, strides=(2, 2), activation=tf.nn.relu)(data_op)
network_output = tf.keras.layers.Conv2D(16, 3, strides=(2, 2), activation=tf.nn.relu)(network_output)
network_output = tf.keras.layers.Flatten()(network_output)
network_output = tf.keras.layers.Dense(120, activation=tf.nn.relu)(network_output)
network_output = tf.keras.layers.Dense(60, activation=tf.nn.relu)(network_output)
network_output = tf.keras.layers.Dense(10)(network_output)
model = tf.keras.Model(inputs=image_input, outputs=network_output)
model.compile(
optimizer=tf.optimizers.Adam(learning_rate=0.001),
loss=tf.losses.SparseCategoricalCrossentropy())
history = model.fit(data, labels, batch_size=batch_size, epochs=max_epoch)
if __name__ == '__main__':
main()
The output :
Train on 50000 samples
Epoch 1/2
WARNING:tensorflow:Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7f28df337c10> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Bad argument number for keyword: 1, expecting 2
32/50000 [..............................] - ETA: 33:42Traceback (most recent call last):
File "tf2_cifar.py", line 49, in <module>
main()
File "tf2_cifar.py", line 45, in main
history = model.fit(data, labels, batch_size=batch_size, epochs=max_epoch)
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/training.py", line 709, in fit
return func.fit(
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 313, in fit
training_result = run_one_epoch(
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
batch_outs = execution_function(iterator)
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
distributed_function(input_fn))
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
result = self._call(*args, **kwds)
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
return self._stateless_fn(*args, **kwds)
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/eager/function.py", line 1137, in _filtered_call
return self._call_flat(
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/eager/function.py", line 1223, in _call_flat
flat_outputs = forward_function.call(
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/eager/function.py", line 506, in call
outputs = execute.execute(
File "/usr/lib/python3.8/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/conv2d/Conv2D (defined at /usr/lib/python3.8/site-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_distributed_function_950]
Function call stack:
distributed_function
I reinstalled driver, CUDA and cudnn, nothing changed (tensorflow 2.0.0)
same here, did someone find some answers ?
Sorry for the delayed response! These may be two bugs.
The first error that @Victor-99 reported is the same as #1 and can be fixed by running pip install --user gast==0.2.2
. Newer versions of TF should fix it.
I'm not sure if the error from @Corentin-pro is the same - I'm investigating a similar issue in https://github.com/tensorflow/tensorflow/issues/34433. If downgrading gast doesn't fix it, could you re-run the snippet with this added to the top: tf.autograph.set_verbosity(3, True)
it should give us additional clues to resolve it.
Is this a bug of tensorflow 2.0?
@Capriciousman in your case, there seem to be two issues -
The warning you see indicates a bug, but since you use only built-in Keras components, it should be safe to ignore. I did a quick investigation, but it doesn't seem to reproduce at head. What version of TF were you using?
Your training seems to generate NaNs. That is more likely due to an incorrect configuration in your model. For instance, the output layer has only one unit, but you need 10 - fashion_mnist is a multi-class model. In fact when I try to run your code it gives me an error about that (Received a label value of 9 which is outside the valid range of [0, 1)
). Setting units to 10 trains the model correctly. I suspect the error message that I'm seeing was only added in a more recent version, which would explain why you didn't get one.
@mdanatg I am using the latest version TF 2.0, Ignore would not lead to the solution. Yes, it is generating Nans, there is an issue in the optimization configuration. I have corrected the # classes. I have tried the same code in google colab and it's working fine there.! Hence seems an issue in my tf/keras/optimization configs.
#######code########
import tensorflow as tf print(tf.version)
mnist = tf.keras.datasets.fashion_mnist (training_images, training_labels), (test_images, test_labels) = mnist.load_data() import matplotlib.pyplot as plt plt.imshow(training_images[0]) print(training_labels[0]) print(training_images[0]) training_images = training_images / 255.0 test_images = test_images / 255.0 model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation=tf.nn.relu), tf.keras.layers.Dense(10, activation=tf.nn.softmax)]) model.compile(optimizer = tf.compat.v1.keras.optimizers.Adam, #Error is here loss = 'sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5) model.evaluate(test_images, test_labels)
@mdanatg I am using the latest version TF 2.0, Ignore would not lead to the solution. Yes, it is generating Nans, there is an issue in the optimization configuration. I have corrected the # classes. I have tried the same code in google colab and it's working fine there.! Hence seems an issue in my tf/keras/optimization configs.
#######code########
import tensorflow as tf print(tf.version)
mnist = tf.keras.datasets.fashion_mnist (training_images, training_labels), (test_images, test_labels) = mnist.load_data() import matplotlib.pyplot as plt plt.imshow(training_images[0]) print(training_labels[0]) print(training_images[0]) training_images = training_images / 255.0 test_images = test_images / 255.0 model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation=tf.nn.relu), tf.keras.layers.Dense(10, activation=tf.nn.softmax)]) model.compile(optimizer = tf.compat.v1.keras.optimizers.Adam, #Error is here loss = 'sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5) model.evaluate(test_images, test_labels)
Hi All Is there a solution about this bug ? Because untill now the bug isnt fix for me
The attachments contain details of the warning that I encountered while working on some dataset. Kindly review it. In case of bug, fix it.
WARNING:tensorflow:Entity <function Function._initialize_uninitialized_variables..initialize_variables at 0x000001CF6FFB6CA8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, .initialize_variables at 0x000001CF6FFB6CA8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux,
export AUTOGRAPH_VERBOSITY=10
) and attach the full output. Cause: module 'gast' has no attribute 'Num' WARNING: Entity <function Function._initialize_uninitialized_variables.export AUTOGRAPH_VERBOSITY=10
) and attach the full output. Cause: module 'gast' has no attribute 'Num'