microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.61k stars 2.92k forks source link

Fatal error: _DCNv2 is not a registered function/op #8635

Closed prabhuiitdhn closed 3 years ago

prabhuiitdhn commented 3 years ago

Describe the bug I do have onnx model [with custom operator added] followed by https://github.com/onnx/onnx/issues/3544 and even following code is working.

onnx.checker.check_model('/home/uib43225/DEFT/src/models/model_mot.onnx') print(onnx.helper.printable_graph(onnx_model.graph))

But now I am trying to inference using onnxruntime. But I am getting following error:

  File "onnx_run_time.py", line 12, in <module>
    rt_s = rt.InferenceSession("/home/uidq6830/PycharmProjects/DEFT_inference/src/models/model_mot.onnx")
  File "/home/uidq6830/deftortpip/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/uidq6830/deftortpip/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 310, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/uidq6830/PycharmProjects/DEFT_inference/src/models/model_mot.onnx failed:Fatal error: _DCNv2 is not a registered function/op

Urgency Please help me out with it, I am looking for strong support as I have a very strict deadline to fix it.

System information

To Reproduce I have installed onnxruntime using pip "pip install onnxruntime" and I will have to add new custom operator to fix it.

NOTE: I already have posted this issue https://github.com/microsoft/onnxruntime/issues/8436 but have not recieved any reponse.

edgchen1 commented 3 years ago

Have you registered your custom op with ONNX Runtime? https://onnxruntime.ai/docs/how-to/add-custom-op.html

prabhuiitdhn commented 3 years ago

@edgchen1: Thank you so much for your reply. I have install "onnxruntime" using pip (pip install onnxruntime) and I am unable to find onnxruntime_c_api.h to add a custom op as per your mentioned link. Please help me more to find it.

But If It is mandatory to install onnxruntime from source to add new custom operator, then please look into https://github.com/microsoft/onnxruntime/issues/8623 issue that I have been already posted as an issue because I am unable to install onnxruntime from source successfully.

Thanks again.

edgchen1 commented 3 years ago

Is your custom op implementation in a shared library? There's an example of how to register one with the Python API in this test: https://github.com/microsoft/onnxruntime/blob/6d3c2c85ef0b663c9beead6db52e5c0551d639bf/onnxruntime/test/python/onnxruntime_test_python.py#L789

In particular, see SessionOptions.register_custom_ops_library(). https://github.com/microsoft/onnxruntime/blob/6d3c2c85ef0b663c9beead6db52e5c0551d639bf/onnxruntime/test/python/onnxruntime_test_python.py#L811

prabhuiitdhn commented 3 years ago

@edgchen1: Do you mean "custom op implementation in a shared library?" as implementing custom op in linux system? If yes, then yes, I will have to implement in shared library.

image

The yellow marked is the custom operator which is already added to onnx model and it is successfully converted. But need to run an inference through onnxruntime. Please guide me to add the operator to onnxruntime.

Can you verify the below changed code will work in this case?

this = os.path.dirname(__file__)
custom_op_model = os.path.join(this, "testdata", "custom_op_library", "custom_op_test.onnx")
(unable to understand it; Please make me understand; what should be testdata, custom_op_library, and custom_op_test.onnx)

if not os.path.exists(custom_op_model):
    raise FileNotFoundError("Unable to find '{0}'".format(custom_op_model))

so1 = onnxrt.SessionOptions()
so1.register_custom_ops_library(shared_library)

# Model loading successfully indicates that the custom op node could be resolved successfully
sess1 = onnxrt.InferenceSession(custom_op_model, so1)
#Run with input data
input_name_0 = sess1.get_inputs()[0].name
input_name_1 = sess1.get_inputs()[1].name
input_name_2 = sess1.get_inputs()[2].name
input_name_3 = sess1.get_inputs()[3].name
input_name_4 = sess1.get_inputs()[4].name
output_name = sess1.get_outputs()[0].name
input_0 = np.ones(534)).astype(np.float32)
input_1 = np.zeros(660).astype(np.float32)
input_2 = np.zeros(661).astype(np.float32)
input_3 = shape(dla_up.ida_2.proj_1.conv.weight).astype(np.float32)
input_4 = shape( dla_up.ida_2.proj_1.conv.bias).astype(np.float32)
res = sess1.run([output_name], {input_name_0: input_0, input_name_1: input_1, input_name_2: input_2, input_name_3: input_3, input_name_4: input_4})
output_expected = np.ones(662).astype(np.float32)
np.testing.assert_allclose(output_expected, res[0], rtol=1e-05, atol=1e-08)
edgchen1 commented 3 years ago

Do you mean "custom op implementation in a shared library?" as implementing custom op in linux system? If yes, then yes, I will have to implement in shared library.

I suppose so, if you are running in a Linux environment. I am not too familiar with implementing custom ops.

@wenbingl would you be able to provide some more guidance?

prabhuiitdhn commented 3 years ago

No Problem @edgchen1. Hi @wenbingl, Will you please help me out?

prabhuiitdhn commented 3 years ago

Hi @edgchen1 and @wenbingl: I have tried to -/+ modify the code in "onnxruntime_test_python.py" and started rebuild the onnxruntime but it did not work and gave an error:

make[2]: [onnxruntime_pybind11_state.so] Error 1 make[2]: Deleting file 'onnxruntime_pybind11_state.so' make[1]: [CMakeFiles/onnxruntime_pybind11_state.dir/all] Error 2 make: [all] Error 2 Traceback (most recent call last): File "/home/uidq6830/test_ort/onnxruntime/tools/ci_build/build.py", line 2258, in sys.exit(main()) File "/home/uidq6830/test_ort/onnxruntime/tools/ci_build/build.py", line 2179, in main build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target) File "/home/uidq6830/test_ort/onnxruntime/tools/ci_build/build.py", line 1130, in build_targets run_subprocess(cmd_args, env=env) File "/home/uidq6830/test_ort/onnxruntime/tools/ci_build/build.py", line 605, in run_subprocess return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env) File "/home/uidq6830/test_ort/onnxruntime/tools/python/util/run.py", line 42, in run completed_process = subprocess.run( File "/home/uidq6830/anaconda3/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/home/uidq6830/test/bin/cmake', '--build', '/home/uidq6830/test_ort/onnxruntime/build/Linux/RelWithDebInfo', '--config', 'RelWithDebInfo', '--', '-j56']' returned non-zero exit status 2.

Looking for your strong support to resolve it.

prabhuiitdhn commented 3 years ago

Do you mean "custom op implementation in a shared library?" as implementing custom op in linux system? If yes, then yes, I will have to implement in shared library.

I suppose so, if you are running in a Linux environment. I am not too familiar with implementing custom ops.

@wenbingl would you be able to provide some more guidance?

Thank you @edgchen1 for helping so far, and hi @wenbingl, will you please help me out, I am actually close to strict deadline. Hope you understood.

wenbingl commented 3 years ago

@prabhuiitdhn if your custom op can run in the Python, that's easy, check this example. https://github.com/microsoft/onnxruntime-extensions/blob/983de7c0feaeb115fcb5a696cb636938b99341b9/test/test_pyops.py#L130

wenbingl commented 3 years ago

@prabhuiitdhn if your custom op can run in the Python, that's easy, check this example. https://github.com/microsoft/onnxruntime-extensions/blob/983de7c0feaeb115fcb5a696cb636938b99341b9/test/test_pyops.py#L130

For sake of performance, the custom op can be implemented in C++, like this example https://github.com/microsoft/onnxruntime-extensions/blob/main/operators/math/inverse.hpp

Feel free to let me know if there is any question.

prabhuiitdhn commented 3 years ago

@wenbingl: Do I need to add "_DCNv2" to test_pyops.py? Bcz I have install onnxruntime from source and I am unable to find this file. Or Do i need to build https://github.com/microsoft/onnxruntime-extensions to inference successfully?

wenbingl commented 3 years ago

@wenbingl: Do I need to add "_DCNv2" to test_pyops.py? Bcz I have install onnxruntime from source and I am unable to find this file. Or Do i need to build https://github.com/microsoft/onnxruntime-extensions to inference successfully?

!pip install onnxruntime-extensions

from onnxruntime_extensions import ( onnx_op, PyCustomOpDef, get_library_path as _get_library_path)

@onnx_op(op_type='_DCNv2',...) def _DCNv2(...) ...

so = _ort.SessionOptions() so.register_custom_ops_library(_get_library_path())

Load your model

inference with _ort.SessionOptions.

sess = _ort.InferenceSession(onnx_model.SerializeToString(), so)

prabhuiitdhn commented 3 years ago

Hi @wenbingl: Just to run code succcessfully, I have followed the instructions: Installed the onnxruntime-extensions and added operator to file to run:


import onnx
import onnxruntime as _ort

from onnxruntime_extensions import (
    onnx_op, PyCustomOpDef,
    get_library_path as _get_library_path)

@onnx_op(op_type='_DCNv2',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])

def _DCNv2(x, y, z, p, q):
    return q

so = _ort.SessionOptions()
so.register_custom_ops_library(_get_library_path())

# load model
onnx_model = onnx.load('/home/uidq6830/PycharmProjects/DEFT_inference/src/models/model_mot.onnx')

# inference with _ort.SessionOptions.
sess = _ort.InferenceSession(onnx_model.SerializeToString(), so)
print("magics happens")

But Still, I am unable to resolve it, I am getting an error:

sess = _ort.InferenceSession(onnx_model.SerializeToString(), so)

File "/home/uidq6830/test/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 324, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/home/uidq6830/test/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 353, in _create_inference_session sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Fatal error: _DCNv2 is not a registered function/op

image The yellow marked "_DCNv2" has to add for inference.

It would be much helpful If i could add it and run it successfully. Thank you.

wenbingl commented 3 years ago

Hi @wenbingl: Just to run code succcessfully, I have followed the instructions: Installed the onnxruntime-extensions and added operator to file to run:


import onnx
import onnxruntime as _ort

from onnxruntime_extensions import (
    onnx_op, PyCustomOpDef,
    get_library_path as _get_library_path)

@onnx_op(op_type='_DCNv2',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])

def _DCNv2(x, y, z, p, q):
    return q

so = _ort.SessionOptions()
so.register_custom_ops_library(_get_library_path())

# load model
onnx_model = onnx.load('/home/uidq6830/PycharmProjects/DEFT_inference/src/models/model_mot.onnx')

# inference with _ort.SessionOptions.
sess = _ort.InferenceSession(onnx_model.SerializeToString(), so)
print("magics happens")

But Still, I am unable to resolve it, I am getting an error:

sess = _ort.InferenceSession(onnx_model.SerializeToString(), so)

File "/home/uidq6830/test/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 324, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/home/uidq6830/test/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 353, in _create_inference_session sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Fatal error: _DCNv2 is not a registered function/op

image The yellow marked "_DCNv2" has to add for inference.

It would be much helpful If i could add it and run it successfully. Thank you.

Is there a blank line between @onnx_op and def _DCNv2 in your source code? And the opset domain of '@onnx_op' is 'ai.onnx.contrib', can you modify your model for that?

prabhuiitdhn commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

wenbingl commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

prabhuiitdhn commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing

Can you please check whether you can download and debug it?

wenbingl commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing

Can you please check whether you can download and debug it? Yes, I will take a look.

prabhuiitdhn commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing Can you please check whether you can download and debug it? Yes, I will take a look.

Thanks @wenbingl Please consider it as high priority. I am very close to strict deadline. It will be higly appreciable.

wenbingl commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing Can you please check whether you can download and debug it? Yes, I will take a look.

Thanks @wenbingl Please consider it as high priority. I am very close to strict deadline. It will be higly appreciable.

Not sure how do you update the opset domain, but from what I found in the model you shared, the opset domain isn't correct. Let me have you some scripts to update the domain.

prabhuiitdhn commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing Can you please check whether you can download and debug it? Yes, I will take a look.

Thanks @wenbingl Please consider it as high priority. I am very close to strict deadline. It will be higly appreciable.

Not sure how do you update the opset domain, but from what I found in the model you shared, the opset domain isn't correct. Let me have you some scripts to update the domain.

Yes, sure, thank you. I think this may help you to understand https://github.com/onnx/onnx/issues/3544

wenbingl commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing Can you please check whether you can download and debug it? Yes, I will take a look.

Thanks @wenbingl Please consider it as high priority. I am very close to strict deadline. It will be higly appreciable.

Not sure how do you update the opset domain, but from what I found in the model you shared, the opset domain isn't correct. Let me have you some scripts to update the domain.

Yes, sure, thank you. I think this may help you to understand onnx/onnx#3544

Good information, if you exported the model from pytorch, then update the custom exporting function with

g.op("ai.onnx.contrib::_DCNv2", ...) instead of g.op("custom_domain::_DCNv2", , then the model should be fine.

prabhuiitdhn commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing Can you please check whether you can download and debug it? Yes, I will take a look.

Thanks @wenbingl Please consider it as high priority. I am very close to strict deadline. It will be higly appreciable.

Not sure how do you update the opset domain, but from what I found in the model you shared, the opset domain isn't correct. Let me have you some scripts to update the domain.

Yes, sure, thank you. I think this may help you to understand onnx/onnx#3544

Good information, if you exported the model from pytorch, then update the custom exporting function with

g.op("ai.onnx.contrib::_DCNv2", ...) instead of g.op("custom_domain::_DCNv2", , then the model should be fine.

Let me check, @wenbingl.

prabhuiitdhn commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing Can you please check whether you can download and debug it? Yes, I will take a look.

Thanks @wenbingl Please consider it as high priority. I am very close to strict deadline. It will be higly appreciable.

Not sure how do you update the opset domain, but from what I found in the model you shared, the opset domain isn't correct. Let me have you some scripts to update the domain.

Yes, sure, thank you. I think this may help you to understand onnx/onnx#3544

Good information, if you exported the model from pytorch, then update the custom exporting function with g.op("ai.onnx.contrib::_DCNv2", ...) instead of g.op("custom_domain::_DCNv2", , then the model should be fine.

Let me check, @wenbingl.

Hi, I have tried to change the code g.op("ai.onnx.contrib::_DCNv2", ...) instead of g.op("custom_domain::_DCNv2") but unfortunately It is not working and giving an error as:

RuntimeError: ONNX export failed: Couldn't export operator ai.onnx.contrib::_DCNv2

Can you please help me to resolve it too?

wenbingl commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing Can you please check whether you can download and debug it? Yes, I will take a look.

Thanks @wenbingl Please consider it as high priority. I am very close to strict deadline. It will be higly appreciable.

Not sure how do you update the opset domain, but from what I found in the model you shared, the opset domain isn't correct. Let me have you some scripts to update the domain.

Yes, sure, thank you. I think this may help you to understand onnx/onnx#3544

Good information, if you exported the model from pytorch, then update the custom exporting function with g.op("ai.onnx.contrib::_DCNv2", ...) instead of g.op("custom_domain::_DCNv2", , then the model should be fine.

Let me check, @wenbingl.

Hi, I have tried to change the code g.op("ai.onnx.contrib::_DCNv2", ...) instead of g.op("custom_domain::_DCNv2") but unfortunately It is not working and giving an error as:

RuntimeError: ONNX export failed: Couldn't export operator ai.onnx.contrib::_DCNv2

Can you please help me to resolve it too?

Can you post your exporting script here?

prabhuiitdhn commented 3 years ago

@wenbingl: Modified the code as below:

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

# added domain = 'ai.onnx.contrib' and no blank line between @onnx_op and def _DCNv2

Nothing changed. same error. Is there any problem with loading the model?

That's unexpected, can you share me the model or a dummy model that can reproduce the error?

https://drive.google.com/file/d/1XdwMvU4J12XRARbjlTuO5cj4mDL4UpLv/view?usp=sharing Can you please check whether you can download and debug it? Yes, I will take a look.

Thanks @wenbingl Please consider it as high priority. I am very close to strict deadline. It will be higly appreciable.

Not sure how do you update the opset domain, but from what I found in the model you shared, the opset domain isn't correct. Let me have you some scripts to update the domain.

Yes, sure, thank you. I think this may help you to understand onnx/onnx#3544

Good information, if you exported the model from pytorch, then update the custom exporting function with g.op("ai.onnx.contrib::_DCNv2", ...) instead of g.op("custom_domain::_DCNv2", , then the model should be fine.

Let me check, @wenbingl.

Hi, I have tried to change the code g.op("ai.onnx.contrib::_DCNv2", ...) instead of g.op("custom_domain::_DCNv2") but unfortunately It is not working and giving an error as: RuntimeError: ONNX export failed: Couldn't export operator ai.onnx.contrib::_DCNv2 Can you please help me to resolve it too?

Can you post your exporting script here?

Adding operator scripts:

class _DCNv2(Function):
    @staticmethod
    def symbolic(g, input, offset, mask, weight, bias, stride, padding, dilation, deformable_groups):
        stride = _pair(stride)
        padding = _pair(padding)
        dilation = _pair(dilation)
        #it is working
        # return g.op("_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding,
        #             dilation_i=dilation, deformable_groups_i=deformable_groups, )
        #
        # return g.op("custom_domain::_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding,
        #             dilation_i=dilation, deformable_groups_i=deformable_groups, )

        return g.op("ai.onnx.contrib::_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding,
                    dilation_i=dilation, deformable_groups_i=deformable_groups, )

        # register_custom_op_symbolic('ai.onnx.contrib::_DCNv2', _DCNv2, 9)

        # def symbolic(g, input, offset, mask, weight, bias, stride, padding, dilation, deformable_groups):
        #     return g.op("_DCNv2", input, offset, mask, weight, bias, name_s="_DCNv2", info_s=json.dumps({
        #         "dilation": dilation,
        #         "padding": padding,
        #         "stride": stride,
        #         "deformable_groups": deformable_groups
        #     }))
            # return g.op("Plugin", input, offset, mask, weight, bias, name_s="_DCNv2", info_s=json.dumps({
            #     "dilation": dilation,
            #     "padding": padding,
            #     "stride": stride,
            #     "deformable_groups": deformable_groups
            # }))

    @staticmethod
    def forward(ctx, input, offset, mask, weight, bias,
                stride, padding, dilation, deformable_groups):
        ctx.stride = _pair(stride)
        ctx.padding = _pair(padding)
        ctx.dilation = _pair(dilation)
        ctx.kernel_size = _pair(weight.shape[2:4])
        ctx.deformable_groups = deformable_groups
        output = _backend.dcn_v2_forward(input, weight, bias,
                                         offset, mask,
                                         ctx.kernel_size[0], ctx.kernel_size[1],
                                         ctx.stride[0], ctx.stride[1],
                                         ctx.padding[0], ctx.padding[1],
                                         ctx.dilation[0], ctx.dilation[1],
                                         ctx.deformable_groups)
        ctx.save_for_backward(input, offset, mask, weight, `bias)`
        return output

Exporting scripts:

    model_path = 'models/model_mot.pth'
    model = create_model(opt.arch, opt.heads, opt.head_conv, opt=opt)
    dummy_input = Variable(torch.randn(1, 3, 544, 960))
    torch.onnx.export(model, dummy_input, "models/model_mot_ai.onnx")
prabhuiitdhn commented 3 years ago

@wenbingl using the below script It was working before but now when I am trying to do it again It is not working, I am not sure why it is not working?

    def symbolic(g, input, offset, mask, weight, bias, stride, padding, dilation, deformable_groups):
        stride = _pair(stride)
        padding = _pair(padding)
        dilation = _pair(dilation)
        return g.op("custom_domain::_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding, 
                         dilation_i=dilation, deformable_groups_i=deformable_groups, )

But this (below) script is working but unable to onnx.checker.check_model and print_graph

  def symbolic(g, input, offset, mask, weight, bias, stride, padding, dilation, deformable_groups):
      stride = _pair(stride)
      padding = _pair(padding)
      dilation = _pair(dilation)
      return g.op("_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding,
                   dilation_i=dilation, deformable_groups_i=deformable_groups, )

I am really sorry that I am unable to solve it. there is no proper documents I have found to fix it. Thank you for your support, I presume we are very close to fix it and we will do it for sure.

wenbingl commented 3 years ago

this commented line should be executed before calling torch.onnx.export

register_custom_op_symbolic('::_DCNv2', _DCNv2, 1)

And this is the correct symbolic function. return g.op("ai.onnx.contrib::_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding, dilation_i=dilation, deformable_groups_i=deformable_groups, )

prabhuiitdhn commented 3 years ago

this commented line should be executed before calling torch.onnx.export

register_custom_op_symbolic('::_DCNv2', _DCNv2, 1)

And this is the correct symbolic function. return g.op("ai.onnx.contrib::_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding, dilation_i=dilation, deformable_groups_i=deformable_groups, )

Hi @wenbingl, When I executed this It was coming an error:

RuntimeError: Failed to register operator ai.onnx.contrib::_DCNv2. The symbolic name must match the format Domain::Name, and sould start with a letter and contain only alphanumerical characters

So, I I have changed whole _DCNv2 as DCNv2 but still when I am trying to register the op:

RuntimeError: ONNX export failed: Couldn't export operator ai.onnx.contrib::DCNv2

The exporting scripts is:

model_path = 'models/model_mot.pth'
model = create_model(opt.arch, opt.heads, opt.head_conv, opt=opt)
dummy_input = Variable(torch.randn(1, 3, 544, 960))
register_custom_op_symbolic('::DCNv2', DCNv2, 1) # not working
# register_custom_op_symbolic('::DCNv2', DCNv2, 9) #also tried with it but not working
torch.onnx.export(model, dummy_input, "models/model_mot_ai.onnx")

I am not sure what's happening.

prabhuiitdhn commented 3 years ago

Thank you so much @wenbingl It was much helpful. Hi @neginraoof: Can you please help me out?

BowenBao commented 3 years ago

PyTorch exporter does not allow domain to contain . in it, thus the above error for domain ai.onnx.contrib.

@wenbingl @ytaous is there an argument for @onnx_op to allow users declaring custom domain? Or does onnxruntime-extension & onnxruntime only work with ai.onnx.contrib for this kind of custom op?

Actually I got confused myself with the onnx domain and PyTorch operator domain. It is PyTorch operator domain that does not allow . in it, so the above error should not happen.

There is no need to call register_custom_op_symbolic for torch.autograd.Function when symbolic defined.

class _DCNv2(Function):
    @staticmethod
    def symbolic(g, input, offset, mask, weight, bias, stride, padding, dilation, deformable_groups):
        stride = _pair(stride)
        padding = _pair(padding)
        dilation = _pair(dilation)
        return g.op("ai.onnx.contrib::_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding,
                    dilation_i=dilation, deformable_groups_i=deformable_groups, )

    @staticmethod
    def forward(ctx, input, offset, mask, weight, bias,
                stride, padding, dilation, deformable_groups):
        ctx.stride = _pair(stride)
        ctx.padding = _pair(padding)
        ctx.dilation = _pair(dilation)
        ctx.kernel_size = _pair(weight.shape[2:4])
        ctx.deformable_groups = deformable_groups
        output = _backend.dcn_v2_forward(input, weight, bias,
                                         offset, mask,
                                         ctx.kernel_size[0], ctx.kernel_size[1],
                                         ctx.stride[0], ctx.stride[1],
                                         ctx.padding[0], ctx.padding[1],
                                         ctx.dilation[0], ctx.dilation[1],
                                         ctx.deformable_groups)
        ctx.save_for_backward(input, offset, mask, weight, `bias)`
        return output

Exporting scripts:

    model_path = 'models/model_mot.pth'
    model = create_model(opt.arch, opt.heads, opt.head_conv, opt=opt)
    dummy_input = Variable(torch.randn(1, 3, 544, 960))
    torch.onnx.export(model, dummy_input, "models/model_mot_ai.onnx")

The above script should work. @prabhuiitdhn could you try again? If it still doesn't, please let us know the PyTorch version you are using. Thanks.

prabhuiitdhn commented 3 years ago

~PyTorch exporter does not allow domain to contain . in it, thus the above error for domain ai.onnx.contrib.~

~@wenbingl @ytaous is there an argument for @onnx_op to allow users declaring custom domain? Or does onnxruntime-extension & onnxruntime only work with ai.onnx.contrib for this kind of custom op?~

Actually I got confused myself with the onnx domain and PyTorch operator domain. It is PyTorch operator domain that does not allow . in it, so the above error should not happen.

There is no need to call register_custom_op_symbolic for torch.autograd.Function when symbolic defined.

class _DCNv2(Function):
    @staticmethod
    def symbolic(g, input, offset, mask, weight, bias, stride, padding, dilation, deformable_groups):
        stride = _pair(stride)
        padding = _pair(padding)
        dilation = _pair(dilation)
        return g.op("ai.onnx.contrib::_DCNv2", input, offset, mask, weight, bias, stride_i=stride, padding_i=padding,
                    dilation_i=dilation, deformable_groups_i=deformable_groups, )

    @staticmethod
    def forward(ctx, input, offset, mask, weight, bias,
                stride, padding, dilation, deformable_groups):
        ctx.stride = _pair(stride)
        ctx.padding = _pair(padding)
        ctx.dilation = _pair(dilation)
        ctx.kernel_size = _pair(weight.shape[2:4])
        ctx.deformable_groups = deformable_groups
        output = _backend.dcn_v2_forward(input, weight, bias,
                                         offset, mask,
                                         ctx.kernel_size[0], ctx.kernel_size[1],
                                         ctx.stride[0], ctx.stride[1],
                                         ctx.padding[0], ctx.padding[1],
                                         ctx.dilation[0], ctx.dilation[1],
                                         ctx.deformable_groups)
        ctx.save_for_backward(input, offset, mask, weight, `bias)`
        return output

Exporting scripts:

    model_path = 'models/model_mot.pth'
    model = create_model(opt.arch, opt.heads, opt.head_conv, opt=opt)
    dummy_input = Variable(torch.randn(1, 3, 544, 960))
    torch.onnx.export(model, dummy_input, "models/model_mot_ai.onnx")

The above script should work. @prabhuiitdhn could you try again? If it still doesn't, please let us know the PyTorch version you are using. Thanks.

Hi @BowenBao: I already have tried with above changes in script and I did again but It is not working. It is still giving an error:

File "/home/uidq6830/PycharmProjects/DEFT/src/train_onnx.py", line 90, in main torch.onnx.export(model, dummy_input, 'models/model_mot_ai.onnx') File "/home/uidq6830/anaconda3/envs/DEFT/lib/python3.6/site-packages/torch/onnx/init.py", line 132, in export strip_doc_string, dynamic_axes) File "/home/uidq6830/anaconda3/envs/DEFT/lib/python3.6/site-packages/torch/onnx/utils.py", line 64, in export example_outputs=example_outputs, strip_doc_string=strip_doc_string, dynamic_axes=dynamic_axes) File "/home/uidq6830/anaconda3/envs/DEFT/lib/python3.6/site-packages/torch/onnx/utils.py", line 342, in _export params_dict, opset_version, dynamic_axes, defer_weight_export, operator_export_type, strip_doc_string) RuntimeError: ONNX export failed: Couldn't export operator ai.onnx.contrib::DCNv2

The PyTorch version is: 1.2.0

Will you please check?

ytaous commented 3 years ago

Have you tried later version of torch? https://onnxruntime.ai/docs/resources/compatibility.html 1.2 seems old to me. @BowenBao can confirm

BowenBao commented 3 years ago

Have you tried later version of torch? https://onnxruntime.ai/docs/resources/compatibility.html 1.2 seems old to me. @BowenBao can confirm

Indeed, @prabhuiitdhn please try with latest PyTorch.

prabhuiitdhn commented 3 years ago

Thank you, I can try but project dependencies on torch==1.2.0; But still, I will try and come back to you if something comes. Is there any way to resolve it using torch==1.2.0?

skottmckay commented 3 years ago

Is there a reason you can't use the latest torch for just the export? The output of that will be an ONNX model with a custom op.

At runtime you plug in the implementation for that custom op, which afaik could come from PyTorch 1.2 if using the PyOp feature from the extensions (obviously that requires running the ONNX model in python).

prabhuiitdhn commented 3 years ago

@BowenBao, @wenbingl, @ytaous, @edgchen1: Thank you so much to all. Much appreciated. Yes, I have solved the issue and able to convert it sucessfully from torch to onnx and able to add an operator to onnxruntime for an inference too.

The solution may help to others:

  1. Load the pytorch compatible DCNv2 from github https://github.com/lbin/DCNv2
  2. Follow the instructions this to add an operator to pytorch to convert to onnx model.
  3. Follow the same to add an operator to onnxruntime for inference.

+others & -above mentioned name: Please ask for help If anyone needs to fix it. I have struggled alot so, I am here to help you too. Thank you.

zbl-96 commented 3 years ago

@BowenBao, @wenbingl, @ytaous, @edgchen1: Thank you so much to all. Much appreciated. Yes, I have solved the issue and able to convert it sucessfully from torch to onnx and able to add an operator to onnxruntime for an inference too.

The solution may help to others:

  1. Load the pytorch compatible DCNv2 from github https://github.com/lbin/DCNv2
  2. Follow the instructions this to add an operator to pytorch to convert to onnx model.
  3. Follow the same to add an operator to onnxruntime for inference.

+others & -above mentioned name: Please ask for help If anyone needs to fix it. I have struggled alot so, I am here to help you too. Thank you. Hi @prabhuiitdhn I have the same problem but I haven't solved it yet. I dont understand this code mean.


@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
return q

don't we should re-implement DCNv2 in this function?  Can you give me this part of the code, about onnx runtime registered.
coldlarry commented 3 years ago

@skottmckay @ytaous @BowenBao @wenbingl @neginraoof Although the above discussion is very exciting, it does not tell us the real answer.i want to custom op but failed.see https://github.com/pytorch/pytorch/issues/65057

philip-fu commented 2 years ago

@prabhuiitdhn Hi, could you share the def _DCNv2() code and inference script if possible? Thanks!

Yaodada12 commented 2 years ago

@BowenBao, @wenbingl, @ytaous, @edgchen1: Thank you so much to all. Much appreciated. Yes, I have solved the issue and able to convert it sucessfully from torch to onnx and able to add an operator to onnxruntime for an inference too.

The solution may help to others:

  1. Load the pytorch compatible DCNv2 from github https://github.com/lbin/DCNv2
  2. Follow the instructions this to add an operator to pytorch to convert to onnx model.
  3. Follow the same to add an operator to onnxruntime for inference.

+others & -above mentioned name: Please ask for help If anyone needs to fix it. I have struggled alot so, I am here to help you too. Thank you.

How exactly did you solve this problem?Can you show your onnxruntime registeter DCNV2‘s’ code?

abhiagwl4262 commented 2 years ago

Hey @prabhuiitdhn, I see you have put so much effort on DCN_v2 export. I am also having hard time exporting this.

I am using following code for DCN_V2 class https://github.com/jinfagang/DCNv2_latest/blob/master/dcn_v2_onnx.py

where I have changed the symbolic function like this

class _DCNv2(Function):

    @staticmethod
    def symbolic(g, input, offset_mask, weight, bias, stride, padding, dilation, deformable_groups):
        return g.op("ai.onnx.contrib::_DCNv2", input, offset_mask, weight, bias, name_s="DCNv2", info_s=json.dumps({
            "dilation": dilation,
            "padding": padding,
            "stride": stride,
            "deformable_groups": deformable_groups
        }))

To export I am using -

torch.onnx.export(net, torch.randn(1,3,W,H), "custom_model.onnx", opset_version=13,
                custom_opsets={"ai.onnx.contrib": 13})

I could export this But when I am trying to run with onnxruntime I am facing the same error error: _DCNv2 is not a registered function/op

I am using this piece of code

 import onnxruntime as _ort
from onnxruntime_extensions import (
            onnx_op, PyCustomOpDef,
            get_library_path as _get_library_path)
# from onnxruntime.tools import pytorch_export_contrib_ops
# pytorch_export_contrib_ops.register()

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

so = _ort.SessionOptions()
so.register_custom_ops_library(_get_library_path())

sess = _ort.InferenceSession("model.onnx", None)

Can you help me resolving this ?

abhiagwl4262 commented 2 years ago

Hey @prabhuiitdhn, I see you have put so much effort on DCN_v2 export. I am also having hard time exporting this.

I am using following code for DCN_V2 class https://github.com/jinfagang/DCNv2_latest/blob/master/dcn_v2_onnx.py

where I have changed the symbolic function like this

class _DCNv2(Function):

    @staticmethod
    def symbolic(g, input, offset_mask, weight, bias, stride, padding, dilation, deformable_groups):
        return g.op("ai.onnx.contrib::_DCNv2", input, offset_mask, weight, bias, name_s="DCNv2", info_s=json.dumps({
            "dilation": dilation,
            "padding": padding,
            "stride": stride,
            "deformable_groups": deformable_groups
        }))

To export I am using -

torch.onnx.export(net, torch.randn(1,3,W,H), "custom_model.onnx", opset_version=13,
                custom_opsets={"ai.onnx.contrib": 13})

I could export this But when I am trying to run with onnxruntime I am facing the same error error: _DCNv2 is not a registered function/op

I am using this piece of code

 import onnxruntime as _ort
from onnxruntime_extensions import (
            onnx_op, PyCustomOpDef,
            get_library_path as _get_library_path)
# from onnxruntime.tools import pytorch_export_contrib_ops
# pytorch_export_contrib_ops.register()

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

so = _ort.SessionOptions()
so.register_custom_ops_library(_get_library_path())

sess = _ort.InferenceSession("model.onnx", None)

Can you help me resolving this ?

I did figure out. I needed to pass session_options while creating ort session. sess = _ort.InferenceSession("model.onnx", None) to sess = _ort.InferenceSession("model.onnx", so)

abhiagwl4262 commented 2 years ago

Hey @prabhuiitdhn, I see you have put so much effort on DCN_v2 export. I am also having hard time exporting this. I am using following code for DCN_V2 class https://github.com/jinfagang/DCNv2_latest/blob/master/dcn_v2_onnx.py where I have changed the symbolic function like this

class _DCNv2(Function):

    @staticmethod
    def symbolic(g, input, offset_mask, weight, bias, stride, padding, dilation, deformable_groups):
        return g.op("ai.onnx.contrib::_DCNv2", input, offset_mask, weight, bias, name_s="DCNv2", info_s=json.dumps({
            "dilation": dilation,
            "padding": padding,
            "stride": stride,
            "deformable_groups": deformable_groups
        }))

To export I am using -

torch.onnx.export(net, torch.randn(1,3,W,H), "custom_model.onnx", opset_version=13,
                custom_opsets={"ai.onnx.contrib": 13})

I could export this But when I am trying to run with onnxruntime I am facing the same error error: _DCNv2 is not a registered function/op I am using this piece of code

 import onnxruntime as _ort
from onnxruntime_extensions import (
            onnx_op, PyCustomOpDef,
            get_library_path as _get_library_path)
# from onnxruntime.tools import pytorch_export_contrib_ops
# pytorch_export_contrib_ops.register()

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

so = _ort.SessionOptions()
so.register_custom_ops_library(_get_library_path())

sess = _ort.InferenceSession("model.onnx", None)

Can you help me resolving this ?

I did figure out. I needed to pass session_options while creating ort session. sess = _ort.InferenceSession("model.onnx", None) to sess = _ort.InferenceSession("model.onnx", so)

I am getting error in inference. code for inference -

import onnx
onnx.checker.check_model('yolact_plus.onnx')

sess = _ort.InferenceSession("yolact_plus.onnx", so)
print("Session initialized successfully")
input_names = [inp.name for inp in sess.get_inputs()]
print('Input Names:', input_names)
output_names = [out.name for out in sess.get_outputs()]
print(output_names)

preds = sess.run(None, {input_names[0]: dummy.numpy()})

Error

2022-07-20 09:28:40.204846007 [E:onnxruntime:, sequential_executor.cc:368 Execute] Non-zero status code returned while running _DCNv2 node. Name:'_DCNv2_28' Status Message: TypeError: _DCNv2() missing 1 required positional argument: 'q'

At:
  /home/ubuntu/anaconda3/envs/yolact-env/lib/python3.7/site-packages/onnxruntime_extensions/_ocos.py(78): _on_pyop_invocation
  /home/ubuntu/anaconda3/envs/yolact-env/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py(192): run
  yolact2onnx.py(52): <module>

Traceback (most recent call last):
  File "yolact2onnx.py", line 52, in <module>
    preds = sess.run(None, {input_names[0]: dummy.numpy()})
  File "/home/ubuntu/anaconda3/envs/yolact-env/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 192, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running _DCNv2 node. Name:'_DCNv2_28' Status Message: TypeError: _DCNv2() missing 1 required positional argument: 'q'
zouyajing commented 4 months ago

Hey @prabhuiitdhn, I see you have put so much effort on DCN_v2 export. I am also having hard time exporting this.

I am using following code for DCN_V2 class https://github.com/jinfagang/DCNv2_latest/blob/master/dcn_v2_onnx.py

where I have changed the symbolic function like this

class _DCNv2(Function):

    @staticmethod
    def symbolic(g, input, offset_mask, weight, bias, stride, padding, dilation, deformable_groups):
        return g.op("ai.onnx.contrib::_DCNv2", input, offset_mask, weight, bias, name_s="DCNv2", info_s=json.dumps({
            "dilation": dilation,
            "padding": padding,
            "stride": stride,
            "deformable_groups": deformable_groups
        }))

To export I am using -

torch.onnx.export(net, torch.randn(1,3,W,H), "custom_model.onnx", opset_version=13,
                custom_opsets={"ai.onnx.contrib": 13})

I could export this But when I am trying to run with onnxruntime I am facing the same error error: _DCNv2 is not a registered function/op

I am using this piece of code

 import onnxruntime as _ort
from onnxruntime_extensions import (
            onnx_op, PyCustomOpDef,
            get_library_path as _get_library_path)
# from onnxruntime.tools import pytorch_export_contrib_ops
# pytorch_export_contrib_ops.register()

@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float,
                 PyCustomOpDef.dt_float], outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    return q

so = _ort.SessionOptions()
so.register_custom_ops_library(_get_library_path())

sess = _ort.InferenceSession("model.onnx", None)

Can you help me resolving this ?

don't u need to register this op to onnx?

Holmes2002 commented 3 months ago

My inference file:

import onnx
import onnxruntime as ort
import torch
from onnxruntime_extensions import (
    onnx_op, PyCustomOpDef,
    get_library_path as _get_library_path
)

# Define the custom operation
@onnx_op(op_type='_DCNv2', domain='ai.onnx.contrib',
         inputs=[PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float, PyCustomOpDef.dt_float], 
         outputs=[PyCustomOpDef.dt_float])
def _DCNv2(x, y, z, p, q):
    # Implement the actual functionality here
    return q

onnx_model_path = "model.onnx"
so = ort.SessionOptions()
so.register_custom_ops_library(_get_library_path())
so.log_severity_level = 0  # Set logging level to verbose

# Helper function to convert PyTorch tensor to NumPy array
def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# Function to perform inference
def infer(ort_session, input_tensor):
    inputs = {ort_session.get_inputs()[0].name: to_numpy(input_tensor)}
    ort_outs = ort_session.run(None, inputs)
    return ort_outs

# Load and check the ONNX model
try:
    onnx_model = onnx.load(onnx_model_path)
    onnx.checker.check_model(onnx_model)
    print("ONNX model is well-formed.")
except Exception as e:
    print(f"Error loading ONNX model: {e}")
    raise

# Inspect the BatchNormalization node
for node in onnx_model.graph.node:
    if node.op_type == "BatchNormalization":
        print(f"Node name: {node.name}")
        for attr in node.attribute:
            print(f"{attr.name}: {attr}")

# Create an inference session
try:
    ort_session = ort.InferenceSession(onnx_model_path, so)
    print("Inference session created successfully.")
except Exception as e:
    print(f"Error creating inference session: {e}")
    raise

dummy_input = torch.randn(1, 3, 1024, 1024)  # Adjust shape if necessary

# Perform inference with detailed logging
try:
    outputs = infer(ort_session, dummy_input)
    print("Model output shape:", outputs[0].shape)
except Exception as e:
    print(f"Error during inference: {e}")
    raise

I did exactly Process in above answers but I encoutered this error

Error during inference: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running BatchNormalization node. Name:'BatchNormalization_136' Status Message: Invalid input scale: 0th dimension != 1
Traceback (most recent call last):
  File "inference_onnx.py", line 61, in <module>
    outputs = infer(ort_session, dummy_input)
  File "inference_onnx.py", line 30, in infer
    ort_outs = ort_session.run(None, inputs)
  File "/home/vudinh/anaconda3/envs/Lore/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running BatchNormalization node. Name:'BatchNormalization_136' Status Message: Invalid input scale: 0th dimension != 1

How can I fix this error ?

wenbingl commented 3 months ago

The error - "Invalid input scale: 0th dimension != 1" suggests the input 0th dimension can be only 1

Holmes2002 commented 3 months ago

I knew but what is problem ? I tested in torch model version and it normally worked but converting to ONNX had this error. Can anyone check for me ? File export ONNX

https://github.com/Holmes2002/Table-Recognition/blob/main/LORE-TSR/src/lib/models/networks/pose_dla_dcn.py

file inference ONNX

https://github.com/Holmes2002/Table-Recognition/blob/main/LORE-TSR/src/lib/models/networks/inference_onnx.py