Model training with ssd_resnet50_v1_fpn_640x640_coco17_tpu-8 results in a Tensor / numpy array conflict

olimaye commented 3 years ago

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py

2. Describe the bug

The script aborts with the error message:

NotImplementedError: Cannot convert a symbolic Tensor (cond_2/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

3. Steps to reproduce

Follow the tutorial at https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html

Then, at the step "Training the Model" it comes to using the the above-mentioned script and the error occurs.

4. Expected behavior

Iniformative output on the training progress as described in the tutorial.

5. Additional context

See full dump attached.

6. System information

WIndows 10
TensorFlow 2.5.0
Python 3.9.5
No GPU, no TPU, just main CPU.

dump.txt

martindeveloper commented 2 years ago

Hello,

I had a same issue but seems like I "fixed" it by just upgrading the Tensorflow to 2.6.0 version.

Here is my environment.yml for conda, hope it helps :relaxed:

name: tensorflow2
channels:
  - conda-forge
  - defaults
dependencies:
  - pip=21.3.1
  - python=3.9
  - tensorflow=2.6.0
  - pandas=1.3
  - pillow=8.4
  - pip:
    - tensorflow-gpu==2.6.0
    - Cython==0.29.24
  # - "git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI"
  # install the dependency above manually byt pip install after env is created

kumariko commented 2 years ago

@olimaye Could you please have a look on the https://github.com/tensorflow/models/issues/10181#issuecomment-954892180, link and let us know if it helps? Thanks!

olimaye commented 2 years ago

Thank you, @martindeveloper ! Unfortunatelly, upgrading to Tensorflow 2.6.0 breaks other dependencies. The pip check dump is as follows:

tensorflow 2.6.0 has requirement flatbuffers~=1.12.0, but you have flatbuffers 20210226132247.
tensorflow 2.6.0 has requirement h5py~=3.1.0, but you have h5py 3.2.1.
tensorflow 2.6.0 has requirement numpy~=1.19.2, but you have numpy 1.20.2.
tensorflow 2.6.0 has requirement six~=1.15.0, but you have six 1.16.0.
tensorflow 2.6.0 has requirement typing-extensions~=3.7.4, but you have typing-extensions 3.10.0.0.
tensorflow-metadata 1.1.0 has requirement absl-py<0.13,>=0.9, but you have absl-py 0.13.0.
flake8 3.9.0 has requirement pycodestyle<2.8.0,>=2.7.0, but you have pycodestyle 2.6.0.
flake8 3.9.0 has requirement pyflakes<2.4.0,>=2.3.0, but you have pyflakes 2.2.0.
autopep8 1.5.6 has requirement pycodestyle>=2.7.0, but you have pycodestyle 2.6.0.
apache-beam 2.31.0 has requirement typing-extensions<3.8.0,>=3.7.0, but you have typing-extensions 3.10.0.0.

Fixing these dependencies manually and running the script again, results in a runtime error of the numpy module as follows:

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
ImportError: numpy.core._multiarray_umath failed to import
ImportError: numpy.core.umath failed to import
Traceback (most recent call last):
  File "[...]\training_demo\model_main_tf2.py", line 31, in <module>
    import tensorflow.compat.v2 as tf
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\__init__.py", line 46, in <module>
    from tensorflow.python import data
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\data\__init__.py", line 25, in <module>
    from tensorflow.python.data import experimental
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\data\experimental\__init__.py", line 99, in <module>
    from tensorflow.python.data.experimental import service
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\data\experimental\service\__init__.py", line 140, in <module>
    from tensorflow.python.data.experimental.ops.data_service_ops import distribute
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\data\experimental\ops\data_service_ops.py", line 25, in <module>
    from tensorflow.python.data.experimental.ops import compression_ops
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\data\experimental\ops\compression_ops.py", line 20, in <module>
    from tensorflow.python.data.util import structure
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\data\util\structure.py", line 26, in <module>
    from tensorflow.python.data.util import nest
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\data\util\nest.py", line 40, in <module>
    from tensorflow.python.framework import sparse_tensor as _sparse_tensor
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\framework\sparse_tensor.py", line 28, in <module>
    from tensorflow.python.framework import constant_op
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\framework\constant_op.py", line 29, in <module>
    from tensorflow.python.eager import execute
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\eager\execute.py", line 27, in <module>
    from tensorflow.python.framework import dtypes
  File "D:\Anaconda3\envs\tf\lib\site-packages\tensorflow\python\framework\dtypes.py", line 32, in <module>
    _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type()
TypeError: Unable to convert function return value to a Python type! The signature was
        () -> handle

At this point, I'm somewhat lost and feel as upgrading the whole environment could be a good idea ...?

tensorflow / models