Error while trying to convert custom built yolov3 model to tflite model

sakthigeek commented 4 years ago

D:\TFlite_model_conversion\onnx_tflite_yolov3-master>python prep.py 2020-07-09 10:58:21.547627: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found 2020-07-09 10:58:21.557802: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. WARNING:tensorflow:From prep.py:8: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

WARNING:tensorflow:From prep.py:9: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

WARNING:tensorflow:From prep.py:11: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-07-09 10:58:25.651074: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found 2020-07-09 10:58:25.664006: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2020-07-09 10:58:25.694386: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-3GEOLB6 2020-07-09 10:58:25.707493: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-3GEOLB6 2020-07-09 10:58:25.722747: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 WARNING:tensorflow:From prep.py:13: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From prep.py:42: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

Traceback (most recent call last): File "C:\Users\LKB-L-097\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1607, in _create_c_op c_op = c_api.TF_FinishOperation(op_desc) tensorflow.python.framework.errors_impl.InvalidArgumentError: Depth of input (418) is not a multiple of input depth of filter (3) for 'convolution_new' (op: 'Conv2D') with input shapes: [1,418,3,418], [3,3,3,32].

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "prep.py", line 45, in op = sess.graph.create_op(op_type=n_org.type, inputs=op_inputs, name=n_org.name+'_new', attrs=atts) File "C:\Users\LKB-L-097\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "C:\Users\LKB-L-097\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op attrs, op_def, compute_device) File "C:\Users\LKB-L-097\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal op_def=op_def) File "C:\Users\LKB-L-097\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1770, in init control_input_ops) File "C:\Users\LKB-L-097\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1610, in _create_c_op raise ValueError(str(e)) ValueError: Depth of input (418) is not a multiple of input depth of filter (3) for 'convolution_new' (op: 'Conv2D') with input shapes: [1,418,3,418], [3,3,3,32].

Model conversion to .pb successful. But shows error in this next step. I would like to know if anyone else encountered this issue and how to fix it.

zldrobit commented 4 years ago

It seems that the problem is because 418 is not a mutiple of 32. Could you change your tensor dimension from 418 to 416?

shahidammer commented 4 years ago

Traceback (most recent call last):
  File "/home/asd/dev/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1607, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Depth of input (418) is not a multiple of input depth of filter (3) for 'convolution_new' (op: 'Conv2D') with input shapes: [1,418,3,418], [3,3,3,32].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "prep.py", line 45, in <module>
    op = sess.graph.create_op(op_type=n_org.type, inputs=op_inputs, name=n_org.name+'_new', attrs=atts) 
  File "/home/asd/dev/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/asd/dev/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/asd/dev/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/asd/dev/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1770, in __init__
    control_input_ops)
  File "/home/asd/dev/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1610, in _create_c_op
    raise ValueError(str(e))
ValueError: Depth of input (418) is not a multiple of input depth of filter (3) for 'convolution_new' (op: 'Conv2D') with input shapes: [1,418,3,418], [3,3,3,32].

Where do I change the Depth of input from 418 > 416? I am using yolov3.weights files

zldrobit commented 4 years ago

@shahidammer It seems the problem is not relevant with the input size. Which version of TensorFlow are you using? This repo is only tested with TF 1.15

shahidammer commented 4 years ago

@zldrobit yes.

absl-py==0.9.0
asn1crypto==0.24.0
astor==0.8.1
cffi==1.14.1
cryptography==2.1.4
cycler==0.10.0
flatbuffers==1.11
gast==0.2.2
google-pasta==0.1.8
grpcio==1.26.0
h5py==2.10.0
idna==2.6
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keyring==10.6.0
keyrings.alt==3.0
kiwisolver==1.1.0
Markdown==3.1.1
matplotlib==3.1.2
numpy==1.17.4
onnx==1.6.0
onnx-tf==1.5.0
onnxruntime-gpu==1.0.0
opencv-python==4.1.2.30
opt-einsum==3.1.0
Pillow==6.2.1
protobuf==3.11.1
pycparser==2.20
pycrypto==2.6.1
pyparsing==2.4.5
python-dateutil==2.8.1
pyxdg==0.25
PyYAML==5.2
SecretStorage==2.3.1
six==1.11.0
tensorboard==1.15.0
tensorflow-estimator==1.15.1
tensorflow-gpu==1.15.0
termcolor==1.1.0
tflite==1.15.0.post1
torch==1.3.1
torchvision==0.4.2
tqdm==4.40.2
typing-extensions==3.7.4.1
Werkzeug==0.16.0
wrapt==1.11.2

Python 3,7,8

zldrobit commented 4 years ago

@shahidammer I cannot reproduce the error using docker zldrobit/onnx:10.0-cudnn7-devel, even after I install Python 3.7.5. Could you try using the docker zldrobit/onnx:10.0-cudnn7-devel to test the code?

OfirBalassiano commented 3 years ago

Hey @shahidammer I got this error too, Do you have any idea how to solve it?

I am using the docker file (pulled the image, run it and cloned the git into it) I use costume YOLOV3 weights

Thanks !

ShivaKothuru commented 3 years ago

@zldrobit @OfirBalassiano @shahidammer

I am facing the same issue. Do you have any idea to solve it?

zldrobit commented 3 years ago

@ShivaKothuru Plz provide a minimal producible code, so we can check which lines of code introducing the error. Moreover, could you provide information about your enviroment, such as Python verison, TF version, PyTorch version, etc. My suggestion is try using the docker image zldrobit/onnx:10.0-cudnn7-devel to resolve enviroment issues.

ShivaKothuru commented 3 years ago

@zldrobit Please find below details: Python version: 3.8 TF: tensorflow-gpu==1.15.0 PyTorch: torch==1.3.1

Also, I tried with the docker image zldrobit/onnx:10.0-cudnn7-devel, It still gives same error. The error is at "prep.py" line 45: "op = sess.graph.create_op(op_type=n_org.type, inputs=op_inputs, name=n_org.name+'_new', attrs=atts)" ValueError: Depth of input (418) is not a multiple of input depth of filter (3) for 'convolution_new' (op: 'Conv2D') with input shapes: [1,418,3,418], [3,3,3,32].

zldrobit commented 3 years ago

@ShivaKothuru It is hard to determine which part causes the error. Could you provide a minimal reproducible example including command line operations you executed?

x1aoo commented 3 years ago

Same problem

ShivaKothuru commented 3 years ago

@zldrobit Thank you for your response. Please find the implementation and error. link gist

Also, Same error when implemented with docker. kindly check.

zldrobit commented 3 years ago

@ShivaKothuru Thanks to your example, I found out that it's onnx-tensorflow causing the issue. onnx-tensorflow determines channel order by whether there's a cuda device: https://github.com/onnx/onnx-tensorflow/blob/e048c0a69b870d661143f561511329dae4acfcfa/onnx_tf/common/__init__.py#L147-L150. By linking a CUDA device to colab notebook, the pipeline can be run flawlessly: https://colab.research.google.com/drive/184AEez0RE_fQ7tQEY8fGwMroJgWDkUni?usp=sharing

zldrobit commented 3 years ago

@sakthigeek @shahidammer @OfirBalassiano @x1aoo Plz try running python onnx2tf.py with CUDA devices. For exmaple, add --gpus all in docker run command. This should resolve the problem.

zldrobit commented 3 years ago

onnx-tensorflow 1.7.0-tf-1.x changes the logic for choosing devices: https://github.com/onnx/onnx-tensorflow/blob/0e4f4836f5c3027918950034640c75d36464be86/onnx_tf/handlers/backend/conv_mixin.py#L97-L103 https://github.com/onnx/onnx-tensorflow/blob/0e4f4836f5c3027918950034640c75d36464be86/onnx_tf/handlers/backend/conv_mixin.py#L236-L242

With onnx-tensorflow 1.7.0-tf-1.x, one can set device=CUDA in https://github.com/zldrobit/onnx_tflite_yolov3/blob/dacbe3327dc8cd1c1c6d3c674df3234612d2a1fa/onnx2tf.py#L8 to avoid this issue.

ShivaKothuru commented 3 years ago

Thank you @zldrobit , it works now.

zldrobit commented 3 years ago

Close the issue since the root cause has been founded.

zldrobit / onnx_tflite_yolov3

Error while trying to convert custom built yolov3 model to tflite model #7