Closed Chase2816 closed 4 years ago
应该是cudnn和cuda版本的原因,你可以试一下我构建的docker image: docker pull zldrobit/onnx:10.0-cudnn7-devel 我用这个image时可以正常转换的
我google了一下,很多人也遇到过这个问题,你可以参考一下: https://github.com/tensorflow/tensorflow/issues/24828 https://github.com/tensorflow/tensorflow/issues/28326 https://stackoverflow.com/questions/53698035/failed-to-get-convolution-algorithm-this-is-probably-because-cudnn-failed-to-in
或者尝试一下新的配置环境,requirements.txt配置文件已经升级了。
感谢,我在试试。我更改了cudnn版本后,尝试更改pytorch版本和tensorflow版本,最后在tensorflow==1.15.0,并把pytorch==1.3.1的版本换成了CPU版本情况下,运行成功,不报错误。我在训练自己数据集的时,在除数据集参数外,默认配置下,使用这个版本的yolov3检测效果没有yunyang1994的yolov3检测效果好,出现少框和多框的情况,请问您有遇见过吗?
这个仓库的代码主要是用来验证Darknet weights > ONNX(PyTorch) > TensorFlow > TFLite的转换,暂时没有关注训练效果,如果你想使用PyTorch训练的话可以参考 https://github.com/ultralytics/yolov3 我推荐你先使用原版的yolov3 github仓库训练权重,将得到的weights再用这个仓库进行转换。 这样准确度应该没有问题。
pt模型转成onnx模型后,测试通过。onnx模型转成pb模型,使用tf_infer.py推理没有错误,但是在使用tf_detect.py时报错。错误如下:`File "D:\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call return fn(*args) File "D:\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn target_list, run_metadata) File "D:\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node convolution}}]] [[815/_27]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node convolution}}]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:/ccpd_dataset/onnx_tflite_yolov3-master/tf_detect.py", line 213, in
detect()
File "D:/ccpd_dataset/onnx_tflite_yolov3-master/tf_detect.py", line 117, in detect
pred = sess.run("815:0", feed_dict={'input.1:0': img})
File "D:\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "D:\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "D:\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node convolution (defined at \Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
[[815/_27]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node convolution (defined at \Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'convolution': File "/ccpd_dataset/onnx_tflite_yolov3-master/tf_detect.py", line 213, in
detect()
File "/ccpd_dataset/onnx_tflite_yolov3-master/tf_detect.py", line 43, in detect
name="")
File "\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\importer.py", line 517, in _import_graph_def_internal
_ProcessNewOps(graph)
File "\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\importer.py", line 243, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3561, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3561, in
for c_op in c_api_util.new_tf_operations(self)
File "\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3451, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "\Anaconda3\envs\py365\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()`,我是按照requirements.txt配置的环境,不知道你是否遇见过这个问题,希望指教,感谢!