Open stephanwlee opened 5 years ago
I get the same error when I have multiple "@tf.function". I am working on a distributed learning project across multiple GPUs. I have one @tf.function for the train loop, and another for the test loop.
with strategy.scope():
@tf.function
def distributed_train_step(dataset_inputs):
(...)
@tf.function
def distributed_test_step(dataset_inputs):
(...)
stamp = datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = 'logs/func/%s' % stamp
writer = tf.summary.create_file_writer(logdir)
tf.summary.trace_on(graph=True)
.
.
.
.
with writer.as_default():
tf.summary.trace_export(name="my_func_trace",step=0)
How are you invoking your tf.function
s? Is it writing to the same writer? If so, this is working as intended. Two tf.functions have graphdefs which may have the same node name but of different type/metadata.
@stephanwlee If we use multiple Gpus which have train_step and test_step tf.functions, how should we resolve this problem? I am facing the same problem which shows it has below errors.
Traceback (most recent call last):
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graph_util.py", line 118, in combine_graph_defs
lambda n: n.name)
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graph_util.py", line 85, in _safe_copy_proto_list_values
raise _SameKeyDiffContentError(key)
tensorboard.plugins.graph.graph_util._SameKeyDiffContentError: sparse_categorical_crossentropy/Shape
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graphs_plugin.py", line 225, in graph_route
result = self.graph_impl(run, tag, is_conceptual, limit_attr_size, large_attrs_key)
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graphs_plugin.py", line 169, in graph_impl
graph_util.combine_graph_defs(graph, func_graph.pre_optimization_graph)
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graph_util.py", line 124, in combine_graph_defs
'but contents are different: %s') % exc)
ValueError: Cannot combine GraphDefs because nodes share a name but contents are different: sparse_categorical_crossentropy/Shape
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/werkzeug/serving.py", line 304, in run_wsgi
execute(self.server.app)
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/werkzeug/serving.py", line 292, in execute
application_iter = app(environ, start_response)
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/backend/application.py", line 164, in wrapper
return wsgi_app(*args)
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/backend/application.py", line 419, in __call__
return self.exact_routes[clean_path](environ, start_response)
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/werkzeug/wrappers/base_request.py", line 237, in application
resp = f(*args[:-2] + (request,))
File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graphs_plugin.py", line 227, in graph_route
return http_util.Respond(request, e.message, 'text/plain', code=400)
AttributeError: 'ValueError' object has no attribute 'message'
E1204 12:24:01.568600 140642150078208 directory_watcher.py:242] File model_dir/logs-new/20191204-122321/events.out.tfevents.1575429801.pc-01.36116.63.v2 updated even though the current file is model_dir/logs-new/20191204-122321/events.out.tfevents.1575429824.pc-01.profile-empty
Any update?
The error message AttributeError: 'ValueError' object has no attribute 'message'
isn't very helpful. There's a bug in the error output:
https://github.com/tensorflow/tensorboard/blob/1780833b30d953509200bf9560be2ba42fabe9ff/tensorboard/plugins/graph/graphs_plugin.py#L323
should be:
return http_util.Respond(request, str(e), 'text/plain', code=400)
However, that only gets us a step closer. Running the original code, the actual error message (that Tensorboard should, but doesn't propagate to the UI) is: Cannot combine GraphDefs because nodes share a name but contents are different: Const
As @stephanwlee mentioned, this is a GraphDef naming collision.
I think the simplest fix around this would be to call trace_on/trace_export separately around each graph call. So do something like this:
import tensorflow as tf
writer = tf.summary.create_file_writer('ex_logs')
@tf.function
def foo(x):
return x ** 2
with writer.as_default():
tf.summary.trace_on()
foo(1)
tf.summary.trace_export("foo1", step=0)
with writer.as_default():
tf.summary.trace_on()
foo(2)
tf.summary.trace_export("foo2", step=0)
Note that trace_export will also stop tracing (https://www.tensorflow.org/api_docs/python/tf/summary/trace_on?version=stable)
This ensures that each trace is separately tagged. This is a debugging tool for visualizing the network graph, and it makes sense that you'd want to profile just a single call of the graph. Tracing is something I'd imagine you wouldn't want to leave on while training, as profiling is expensive anyways.
This official tutorial in Colab returns an error when I choose keras or batch_2 tag: Download PNG button doesn't work also:
same problem
I would suggest exporting them as different traces with different names. That seems to work for me.
Instead of this:
with writer.as_default():
tf.summary.trace_on()
foo(1)
foo(2)
tf.summary.trace_export("foo")
Do this:
with writer.as_default():
tf.summary.trace_on()
foo(1)
tf.summary.trace_export("foo1")
tf.summary.trace_on()
foo(2)
tf.summary.trace_export("foo2")
I can hardly recognize the location of the error in my code.
any update about this issue? It has been more than one year since the issue was put forward 😢
I had the same issue. Tensorboard needs unique names to be given to the graph variables (I don't why and I hope this issue will be fixed). In your case this piece of code should fix it:
import tensorflow as tf
@tf.function
def foo(x):
return x ** 2
writer=tf.summary.create_file_writer('logs\\')
with writer.as_default():
tf.summary.trace_on()
foo(tf.Variable(1, name='foo1')) # define a unique name for the variable
foo(tf.Variable(2, name='foo2'))
tf.summary.trace_export("foo", step=0)
This issue also exists when overriding tf.Module. Then, self.name_scope (or tf.name_scope) can be used when defining the module variables (wrapping the other operations or not). Here is an example of a custom Dense layer:
import tensorflow as tf
import numpy as np
class Dense(tf.Module):
# Fully-connected layer.
def __init__(self, out_fmaps, name=None):
super().__init__(name=name)
self.is_built = False
self.out_fmaps = out_fmaps
def __call__(self, x):
if not self.is_built:
with self.name_scope: # Creates the variable under name_scope
he_init = np.sqrt(2/x.shape[-1])
init_val = tf.random.normal([x.shape[-1], self.out_fmaps])*he_init
self.w = tf.Variable(init_val, name='dense')
self.is_built = True
return tf.matmul(x, self.w)
In TensorFlow v2, below code can cause GraphDef reconciliation error.
@tf.function def foo(x): return x ** 2 with writer.as_default(): tf.summary.trace_on() foo(1) foo(2) tf.summary.trace_export("foo")
Depending on the argument,
tf.function
(really, auto-graph) creates ops that are unique within GraphDef but is not globally unique. In the example above, two GraphDefs (on fromfoo(1)
and another fromfoo(2)
) will be written out and they can collide badly in names and content.In such case, instead of showing wrong graph content, TensorBoard throws an error.
you could delete other tf.function state and run all steps in one function by function call to resolve this problem.
def foo(x): return x ** 2 @tf.function def foooo(x1, x2): foo(x1) foo(x2) with writer.as_default(): tf.summary.trace_on() foooo(1, 2) tf.summary.trace_export("foo")
Getting this error while visualizing the .pb model. I have created "events.out.tfevents.1621934261.6e85c43ac415" file from following code.
model_filename = 'model.pb'
import tensorflow as tf
from tensorflow.python.platform import gfile
with tf.Session() as sess:
with gfile.FastGFile(model_filename, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
g_in = tf.import_graph_def(graph_def)
LOGDIR='op'
train_writer = tf.summary.FileWriter(LOGDIR)
train_writer.add_graph(sess.graph)
Could someone please advise for solving the malformed Op graph error that I am getting?
While running a custom Keras model with tensorboard callback. The Conceptual graph is generated, however, the Op graph returns: Error: Malformed GraphDef. I tried some existing suggestions related to potential naming conflicts and using name_scope, however, to no avail.
logdir = 'logs/func/' + datetime.now().strftime("%Y%m%d-%H%M%S")
class MCLayer(tf.keras.layers.Layer):
def __init__(self, name=None):
super(MCLayer, self).__init__(name=name)
#with tf.name_scope('test1'):
self.nT = tf.constant(400)
self.n = tf.constant(100000)
self.dt = tf.constant(1/365)
self.drift = tf.constant(0.08)
self.sigma = tf.constant(0.1)
#@tf.function
def call(self, inputs):
#with tf.name_scope('test2'):
dWt = tf.random.normal(mean=0, stddev=tf.math.sqrt(self.dt), shape=[self.nT, self.n])
dYt = self.drift*self.dt + self.sigma*dWt
C = tf.cumsum(dYt, axis=0)
S = tf.exp(C)
A = tf.reduce_mean(S, axis=0)
P = tf.reduce_mean(tf.maximum(A - inputs, 0))
return P
input_layer = tf.keras.layers.Input(shape=(1), name='input_layer')
output_layer = MCLayer(name='output_layer')(input_layer)
model = tf.keras.models.Model(input_layer, output_layer, name='SomeModel')
model.compile()
result = model.predict(tf.constant(1.0, shape=(1,)), callbacks=[tf.keras.callbacks.TensorBoard(log_dir=logdir)])
I got a Keras SavedModel from another data scientist and want to see the graph in TensorBoard. I faced the same Malformed GraphDef issue, using TF 2.8
import tensorflow as tf
import numpy as np
model = tf.keras.models.load_model(model_path)
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir='C:/log_tf2',
update_freq=1,
histogram_freq=1,
write_graph=True,
write_images=True
)
tensorboard_callback.set_model(model)
result = model.predict(
{
'a': tf.constant(np.random.rand(1))
},
callbacks=[tensorboard_callback],
verbose=1)
A new "train" folder was created in log_dir, containing only one tiny events.out.tfevents.XXXXXXXX.v2 file (14KB)
TF1 usually produced a log directory with big log of a model (the size was compatible to a size of a frozen graph).
Sorry to hear you're having trouble with this. However, we won't really be able to debug these without more detail about the actual graphdef that caused the issue, preferably as a graphdef pbtxt or an events.out.tfevents file. If the graph is sensitive and can't be shared we unfortunately won't be able to get much farther, but you can try looking at the Javascript Console to see if there is any more detail about the error message there.
As for the differences between TF1 and TF2 file sizes, that might be expected depending on the graph contents - again, it would be hard to say anything more without knowing the specific graph and the specific files in question.
I was facing this issue as well. Found out that switching the verbose-type from 2 to 1 in the model.fit()-function solved the problem. This might help somebody here, too. Since I'm unsure if this behaviour is intended, I created a issue for it (see here: https://github.com/tensorflow/tensorboard/issues/5745).
I was facing this issue as well. Found out that switching the verbose-type from 2 to 1 in the model.fit()-function solved the problem. This might help somebody here, too. Since I'm unsure if this behaviour is intended, I created a issue for it (see here: #5745).
Switching the verbose from 2 to 1 resolved the problem. Thanks!
I was facing this issue as well. Found out that switching the verbose-type from 2 to 1 in the model.fit()-function solved the problem. This might help somebody here, too. Since I'm unsure if this behaviour is intended, I created a issue for it (see here: #5745).
Thank you!! Based on your comment, I switched from verbose type 0 to 1 which resolved the issue as well.
In TensorFlow v2, below code can cause GraphDef reconciliation error.
Depending on the argument,
tf.function
(really, auto-graph) creates ops that are unique within GraphDef but is not globally unique. In the example above, two GraphDefs (on fromfoo(1)
and another fromfoo(2)
) will be written out and they can collide badly in names and content.In such case, instead of showing wrong graph content, TensorBoard throws an error.