Open Zrufy opened 3 years ago
What is the error? Which system are you using? Please share your code config file what is the command you are using? which tf version is being used?
1)I don't know the error for this I was asking. 2)Windows 3)config 4)I have already entered the command I use 5)2.4.1
I am a little confused by the description. can you clarify what you meant by
i get this error.
Has anyone already encountered this error? Are there any solutions?
the reason for this confusion is that i don't see any error message pointed out in the description.
the error is in the title ""merge_call called while defining a new graph or a tf.function.""
.
Complete is :
" RuntimeError:
merge_callcalled while defining a new graph or a tf.function. This can often happen if the function
fnpassed to
strategy.run()contains a nested
@tf.function, and the nested
@tf.functioncontains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function
fnuses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested
tf.functions or control flow statements that may potentially cross a synchronization boundary, for example, wrap the
fnpassed to
strategy.runor the entire
strategy.runinside a
tf.functionor move the control flow out of
fn"
okay got it. can you share the code, logs where you are seeing this issue? or code to reproduce the issue
thanks
INFO:tensorflow:Error reported to Coordinator: in user code:
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:613 train_step_fn *
loss = eager_train_step(
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:310 eager_train_step *
optimizer.apply_gradients(zip(gradients, trainable_variables))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:99 apply_gradients *
self.update_average(self.iterations)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:124 update_average *
self._model_weights),))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2941 merge_call **
return self._merge_call(merge_fn, args, kwargs)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\mirrored_run.py:433 _merge_call
"`merge_call` called while defining a new graph or a tf.function."
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`
Traceback (most recent call last):
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\training\coordinator.py", line 297, in stop_on_exception
yield
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\mirrored_run.py", line 323, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 670, in wrapper
raise e.ag_error_metadata.to_exception(e)
RuntimeError: in user code:
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:613 train_step_fn *
loss = eager_train_step(
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:310 eager_train_step *
optimizer.apply_gradients(zip(gradients, trainable_variables))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:99 apply_gradients *
self.update_average(self.iterations)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:124 update_average *
self._model_weights),))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2941 merge_call **
return self._merge_call(merge_fn, args, kwargs)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\mirrored_run.py:433 _merge_call
"`merge_call` called while defining a new graph or a tf.function."
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`
I0504 13:43:43.337412 10136 coordinator.py:219] Error reported to Coordinator: in user code:
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:613 train_step_fn *
loss = eager_train_step(
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:310 eager_train_step *
optimizer.apply_gradients(zip(gradients, trainable_variables))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:99 apply_gradients *
self.update_average(self.iterations)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:124 update_average *
self._model_weights),))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2941 merge_call **
return self._merge_call(merge_fn, args, kwargs)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\mirrored_run.py:433 _merge_call
"`merge_call` called while defining a new graph or a tf.function."
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`
Traceback (most recent call last):
File "C:\Users\nvidiatesla\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\training\coordinator.py", line 297, in stop_on_exception
yield
File "C:\Users\nvidiatesla\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\mirrored_run.py", line 323, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "C:\Users\nvidiatesla\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 670, in wrapper
raise e.ag_error_metadata.to_exception(e)
RuntimeError: in user code:
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:613 train_step_fn *
loss = eager_train_step(
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:310 eager_train_step *
optimizer.apply_gradients(zip(gradients, trainable_variables))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:99 apply_gradients *
self.update_average(self.iterations)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:124 update_average *
self._model_weights),))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2941 merge_call **
return self._merge_call(merge_fn, args, kwargs)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\mirrored_run.py:433 _merge_call
"`merge_call` called while defining a new graph or a tf.function."
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`
Traceback (most recent call last):
File "model_main_tf2.py", line 113, in <module>
tf.compat.v1.app.run()
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 110, in main
record_summaries=FLAGS.record_summaries)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py", line 664, in train_loop
loss = _dist_train_step(train_input_iter)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\eager\def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\eager\def_function.py", line 726, in _initialize
*args, **kwds))
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\eager\function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\eager\function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\eager\function.py", line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\framework\func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\eager\def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\framework\func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
RuntimeError: in user code:
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:648 _dist_train_step *
_sample_and_train(strategy, train_step_fn, data_iterator)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:630 _sample_and_train *
per_replica_losses = strategy.run(
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:613 train_step_fn *
loss = eager_train_step(
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\object_detection\model_lib_v2.py:310 eager_train_step *
optimizer.apply_gradients(zip(gradients, trainable_variables))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:99 apply_gradients *
self.update_average(self.iterations)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\official\modeling\optimization\ema_optimizer.py:124 update_average *
self._model_weights),))
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2941 merge_call **
return self._merge_call(merge_fn, args, kwargs)
C:\Users\user\anaconda3\envs\TSOBJ2\lib\site-packages\tensorflow\python\distribute\mirrored_run.py:433 _merge_call
"`merge_call` called while defining a new graph or a tf.function."
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`
i read that error but is not similar the situation to resolve.
Two things that can be done (i tested here and it seems to be working)
@tf.function
from _dist_train_step(data_iterator)
and then give a run. global_step
global_step = tf.Variable(
0, trainable=False, dtype=tf.compat.v2.dtypes.int64, name='global_step',
aggregation=tf.compat.v2.VariableAggregation.ONLY_FIRST_REPLICA, synchronization=tf.VariableSynchronization.ON_READ)changes 1: line 641
@tf.function
def _dist_train_step(data_iterator):
to
def _dist_train_step(data_iterator):
change2:
line 554:
global_step = tf.Variable(
0, trainable=False, dtype=tf.compat.v2.dtypes.int64, name='global_step',
aggregation=tf.compat.v2.VariableAggregation.ONLY_FIRST_REPLICA)
to
global_step = tf.Variable(
0, trainable=False, dtype=tf.compat.v2.dtypes.int64, name='global_step',
aggregation=tf.compat.v2.VariableAggregation.ONLY_FIRST_REPLICA, synchronization=tf.VariableSynchronization.ON_READ)
Can you try these options and let me know?
just tried the changes but nothing keeps giving me the same error. I changed this in the config
type: 'ssd_mobilenet_v1_keras'
for the tensorflow version 2.x but cmq I have the same error.
I also tried other mobilenet but I have the same error with other net the train going well.
this error occur for ssd_mobilenet and mobilenetv2
@Mrinal18 any news about this type of error?
using the config inside the samples / config folder I have this type of error taking the config from the config / tf2 folder no error. But I would like to understand at this point if it was possible to use that config and that type of model on 2.4.0.
I found if I add use_moving_average: false
in optimizer then the problem disappeared, but I didn't dig in further.
Hi @Zrufy @Mrinal18 @b04505009 ,
Are there any news on how to solve this? I am having the same error message for _ssd_mobilenet_v2keras, but not with _ssd_efficientnet-b1_bifpnkeras, for example.
Cheers, R.
starting the training with the command
python model_main_tf2.py --logtostderr --model_dir = training / --pipeline_config_path = training / ssd_mobilenet_v1_focal_loss_pets.config
i get this error. Has anyone already encountered this error? Are there any solutions?