mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

INFO:tensorflow:Error reported to Coordinator: __call__() missing 1 required positional argument: 'step' #520

Open missximon opened 2 years ago

missximon commented 2 years ago

Hi, does anyone have the question like me? When I use mlcommons/training/image_classification to train model, I always has the error "INFO:tensorflow:Error reported to Coordinator: call() missing 1 required positional argument: 'step'" and can not fix it;

Instructions for updating: experimental_compile is deprecated, use jit_compile instead INFO:tensorflow:Error reported to Coordinator: __call__() missing 1 required positional argument: 'step' Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception yield File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 346, in run self.main_result = self.main_fn(*self.main_args, **self.main_kwargs) File "/tmp/tmp0xz4xfof.py", line 144, in _apply_grads_and_clear_for_each_replica ag__.converted_call(ag__.ld(self).optimizer.apply_gradients, (ag__.converted_call(ag__.ld(zip), (ag__.ld(replica_accum_grads), ag__.ld(self).training_vars), None, fscope_3),), None, fscope_3) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 382, in converted_call return _call_unconverted(f, args, kwargs, options) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 464, in _call_unconverted return f(*args) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 671, in apply_gradients apply_state = self._prepare(var_list) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 953, in _prepare self._prepare_local(var_device, var_dtype, apply_state) File "/home/siwei.zm/mlperf/training/image_classification/tensorflow2/lars_optimizer.py", line 114, in _prepare_local lr_t = self._get_hyper("learning_rate", var_dtype) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 810, in _get_hyper value = value()

johntran-nv commented 1 year ago

@sgpyc can you advise?

tersiteab commented 1 year ago

Hi, I am getting similar error. Is this resolved? Thank you