tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
185.78k stars 74.21k forks source link

Ghost bug on macOS Big Sur - TypeError: tf__gradient_update() missing n required positional arguments #47582

Closed ghost closed 3 years ago

ghost commented 3 years ago

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

== check python ===================================================
python version: 3.8.7
python branch: 
python build version: ('default', 'Dec 30 2020 10:14:55')
python compiler version: Clang 12.0.0 (clang-1200.0.32.28)
python implementation: CPython

== check os platform ===============================================

== are we in docker =============================================
No

== compiler =====================================================
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin20.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

== check pips ===================================================
numpy                      1.19.5
protobuf                   3.14.0
tensorflow                 2.4.1
tensorflow-addons          0.12.0
tensorflow-estimator       2.4.0
tensorflow-probability     0.12.1

== check for virtualenv =========================================
False

== tensorflow import ============================================
tf.version.VERSION = 2.4.1
tf.version.GIT_VERSION = v2.4.0-49-g85c8b2a817f
tf.version.COMPILER_VERSION = 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)

== env ==========================================================
LD_LIBRARY_PATH is unset
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
./collect_tf.sh: line 145: nvidia-smi: command not found

== cuda libs  ===================================================

== tensorflow installed from info ==================
Name: tensorflow
Version: 2.4.1
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /usr/local/lib/python3.8/site-packages
Required-by: yolo-tf2

== python version  ==============================================
(major, minor, micro, releaselevel, serial)
(2, 7, 16, 'final', 0)

== bazel version  ===============================================

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior The issue sometimes happens and sometimes it doesn't. It usually occurs when I run some perfectly working code then make some minor changes followed by a re-run. I get the error / a variation of the very same error message referring to missing positional arguments while they are there and there is absolutely nothing wrong.

Traceback (most recent call last):
  File "/Users/emadboctor/Desktop/code/drl-algos/acer.py", line 273, in <module>
    ]
  File "/Users/emadboctor/Desktop/code/drl-algos/base_agent.py", line 346, in fit
    self.train_step()
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 725, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3196, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3887, in bound_method_wrapper
    return wrapped_fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
    raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:

    TypeError: tf__gradient_update() missing 5 required positional arguments: 'states', 'rewards', 'actions', 'dones', and 'action_probs'

Then, the error is usually gone in the following run.

Describe the expected behavior

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

jvishnuvardhan commented 3 years ago

@emadboctorx Can you please share a simple standalone code to reproduce the error? Thanks!

ghost commented 3 years ago

@jvishnuvardhan The error does not recur. It sometimes happens between re-runs and whenever I encounter it, I just re-run (with zero modifications), then it runs perfectly fine. Therefore I don't think sharing the code is likely to reproduce the error.

rohan100jain commented 3 years ago

@emadboctorx in order to understand what might be going wrong, its important we can reproduce the error. Even if its a workflow that you do i.e. start with code X, make modification Y, re-run is good enough for us but without a reproduction, its really hard to understand what is going on.

ghost commented 3 years ago

@rohan100jain well I see your point but the error sometimes occurs and sometimes it doesn't and if it did, it won't repeat itself again unless you changed a random part of the code. I don't know this is me totally speculating but I think it has something to do with compilation under tf.function, the error happens when modifications are made to the code and then run and compiled for the first time. If the error was reproduced, it does not recur in the following run otherwise, it would be easy to fix. Therefore I have to make some random modifications that sometimes reproduce the error and sometimes they don't or compile just fine according to my theory. I will try to reproduce on my own and see which parts of the code are involved and I may edit the issue accordingly.

google-ml-butler[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No