Problems upgrading MO 2021.4->2023.0 with a custom layer

w3sip commented 1 year ago

We convert one of the models from Caffe to OpenVino using MO. The approach had been to patch mo_caffe.py/caffe_pb.py, and deploy a FrontExtractorOp-derived class and Op-derived class.

It had been a lot of trying to feel the way in the dark, but we're making progress:

The deployment location of Python code changed from site-packages/openvino/tools/extensions/front/caffe and site-packages/openvino/tools/extensions/ops to site-packages/openvino/tools/mo/front/caffe/proto.
CustomLayersMapping.xml had to be introduced into the process
(This was a fun one to figure out) If custom layer definition in CustomLayersMapping.xml included protoParamName, the Python layer code, while loaded, was never utilized. This resulted in Caffe fallback is deprecated/ Please register python infer function for op error. Removing this attribute resolved that issue. Still not sure why.
Current problem is the process failing during serialization. A warning Please setversionattribute for node detection_out with type=Yolov3DetectionOutput is logged, followed by the error Cannot create Yolov3DetectionOutput layer detection_out id:490 from unsupported opset: extension. I've attempted to define version attribute in the Python layer with little success. Hoping someone can save me a day or two of code archeology and get over this bump.

(Probably not critical, but all of the above is with Intel distribution of OV / toolkit installed with pip3 install openvino-dev[caffe,onnx]==2023.0.0 on Linux)

slyalin commented 1 year ago

Hi @w3sip,

Not sure that Caffe flow is being tested intensively last 2-3 years, especially including extensions. So I can guess that something is not working properly. BTW, we deprecated support for Caffe starting 2023.0.

Anyway, let's try to understand what you are doing and what the issue is.

I didn't get why modification of mo is required and why changed path for mo was an issue that was worth mentioning. Did you try to follow the approach described here: https://docs.openvino.ai/2022.1/openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers.html instead of modifying mo files? I believe in this case you don't need to bother about mo files location. Have you tried to use --extensions to avoid putting files associated with extensions to mo installation directory?
Using CustomLayersMapping is marked as a legacy extension approach (according to https://docs.openvino.ai/2023.0/openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe.html, and without looking into the code I cannot remember how it really works). It requires Caffe to work properly according to that documentation. The error that starts with "Caffe fallback is deprecated..." and "Please register python infer function for op" should actually say that "there is no Caffe available". Do you see this printed as a part of an error message? We have a dedicated FAQ entry for this case: https://docs.openvino.ai/2022.3/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html#q14-what-does-the-message-cannot-infer-shape-for-node-because-there-is-no-caffe-available-please-register-python-infer-function-for-op-or-use-caffe-for-shape-inference-mean. Can you confirm that you see exactly this message?
The problem from point your 4 leads me to the code of c++ openvino IR reader, not serialization if I followed the code correctly. This shouldn't be triggered actually for extension operations if deserialization is called as a part of mo -- looks like a bug. Can you confirm that this error appears as a part of mo call, and not when you are trying to load resulting xml file in read_model/compile_model in the inference application?

Please share modifications you made and any artefacts that help us to quickly reproduce the issue.

Adding @rkazants to seek the advise.

@w3sip, also, please comment, if this works with 2021.4 version of mo, why do you need to re-convert it with a newer mo? Can you just always use old mo? I understand it is probably not very convenient, but do you have an issue with IR converted with old mo besides inconvenient model preparation flow? Does IR provided my old mo work with 2023.0 OV runtime?

Caffe is fading out, so I wouldn't expect much support on that to be honest.

w3sip commented 1 year ago

@slyalin

First and foremost, thank you for the elaborate answer.

Let me address your points one by one:

We're on the tail end of using Caffe as well. The models we're converting will be deprecated soon, and the next generation will not require it. However, for a variety of reasons we need to keep them going for a bit, and also need to upgrade to the latest OV. So, fully aware of deprecation -- need to keep it working for a bit.

Regarding the paths, most of this setup is legacy from different people and much older versions of OV. I do like the --extensions approach better then patching, so I've tried it:

strace mo --input_model /src/yolov3_416/model.caffemodel -d /src/yolov3_416/deploy.prototxt -s 255.0 --output_dir /src/yolov3_416-out --log_level=DEBUG --use_legacy_frontend --extensions /src/yolov3_416/../vino/modules --caffe_parser_path /tmp/temp/proto/caffe_pb2.py --input_shape '[1,3,416,416]'

It fails with failure to locate the custom layer, and strace doesn't show files in /src/yolov3_416/../vino/modules ever being accessed (neither CustomLayersMapping, nor the python files). Not sure what's going on, and would love to get this working ... but as long as patching the install works, it's not a huge deal. (The error seen in this case is

wait4(953, [ ERROR ]  Exception occurred during running replacer "REPLACEMENT_ID" (<class 'openvino.tools.mo.load.caffe.loader.CaffeLoader'>): Unexpected exception happened during extracting attributes for node detection_out.
Original exception message: Found custom layer "detection_out". Model Optimizer does not support this layer. Please, implement extension.
For more information please refer to Model Optimizer FAQ, question #45. (https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=45#question-45)

So far (as in: with previous versions) it had been working without the caffe installed, and looking at the diff, I don't think this should've changed.
I believe you're correct. The error seems to be coming from XmlDeserializer::create_node. Time wise, it does occur during the mo call, and takes much longer to happen than any front end errors resolved. No files are emitted as the result, so I do think it happens during model generation.
why do you need to re-convert You know, it's a very good question. In the past, we've seen issues loading model converted with the old MO into newer OpenVino, so I assumed the error we were seeing was one of those. However, having seen the conversion errors, this one is 100% related:
```
Check 'false' failed at src/frontends/common/src/frontend.cpp:54:
Converting input model
Cannot create Yolov3DetectionOutput layer detection_out id:490 from unsupported opset: extension
```
So this error occurs both at conversion stage, but also while loading an older / already converted model into the inference engine.

rkazants commented 1 year ago

Hi @w3sip,

If you share the model or the reproducer, it will greatly help us to debug this problem.

Best regards, Roman

w3sip commented 1 year ago

Hey Roman, I should be able to have something next week. If you guys have any ideas to try till then, let me know. Thanks! Alex

w3sip commented 1 year ago

@rkazants / @slyalin - sorry for the delay.

Here's the package including the model - https://1drv.ms/u/s!Am36O7LweZ6zgf0bWQGQ6zwhVa92-g?e=js2lry ... The only prerequisite is available Docker installation. Running reproduce.sh will create a basic container to run in, a virtualenv with OpenVino installed (thus first run takes awhile) -- and will call ./vino/convertToVino.sh to attempt the conversion.

Please let me know if this is enough for you to reproduce the problem.

andrei-kochin commented 1 year ago

@w3sip I've tried to modify script to check that it works on 2021.4 but even after modifications I've ended up in incompatible extensions as shared extensions requires openvino.tools.mo namespace.

Could you please share working 2021.4 pack of extensions so we can compare?

Also to minimize delta could you please check if the issue is reproducible on 2022.1?

Locally your script fails with following on 2022.1 - 2023.0 but opset is detected as "2" not as "extension" in your case:

[ ERROR ]  Cannot create Yolov3DetectionOutput layer detection_out id:490 from unsupported opset: 2
[ ERROR ]  offline transformations step has failed.

avitial commented 1 year ago

@w3sip based on Andrei's last reply, do you have an update for us?

w3sip commented 1 year ago

Hey guys -- I apologize for the delay, as I'm on an extended personal leave for the moment.

It'd take me a bit of time to get a portable 2021.4 pack once I'm back, should be able to get to it sometime next week. If my memory serves me right, that setup was generated by installing full version of OpenVino (into /opt/intel), and then copying the extension files I've included with the above package "caffe_pb2.py" into "mo/front/caffe/proto" "mo_caffe.proto" into "mo/front/caffe/proto" "yolov3_detection_output_ext.py" into "extensions/front/caffe" "yolov3detectionoutput.py" into "extensions/ops" Again -- sorry to be non-specific here, will have to try recreating that environment a bit later; all I can tell is that we've been using it for over 1.5 years to convert the models ;)

I believe 2021.4 is the last version I was able to get things working with, but need to confirm that as well.

I'd also like to return to the point Sergey made:

The problem from point your 4 leads me to the code of c++ openvino IR reader, not serialization if I followed the code correctly.

This seems to be both relevant and correct, which makes me wonder why it happens during conversion.

Sorry for the delay, will get back -- to it, and with more precise answers -- soon.

w3sip commented 1 year ago

@andrei-kochin - here's the package reproducing the same model being built with 2021.4: https://1drv.ms/u/s!Am36O7LweZ6zgo5783R9wHUtAry1YA?e=CMYPYG

Just unzip the contents of the reproduce2.zip into the folder, cd into that folder and run buildModel.sh

It's worth noting, that while the model is successfully converted with image with 2021.4 label, 2021.4.1 already fails -- albeit the error seems different from the originally reported.

andrei-kochin commented 1 year ago

@w3sip thanks for the valuable comment! Delta 2021.4 vs 2021.4.1 is much smaller than vs 2023.0 =)

Let us check and get back to you with findings

andrei-kochin commented 1 year ago

Ref. 116127

w3sip commented 1 year ago

Any luck with this, guys?

pavel-esir commented 1 year ago

Hi @w3sip, i will update today about the progress

pavel-esir commented 1 year ago

@w3sip sorry for the slow update. I took a look and looks like when you run reproducer2 for image 2021.4.1 in compile.sh you forgot to correct MO=/opt/intel/openvino_2021.4.582/deployment_tools/model_optimizer to the new revision MO=/opt/intel/openvino_2021.4.689/deployment_tools/model_optimizer

when i corrected to 689 revision 2021.4.1 works as well.

So the problem is most probably not so old. I will continue to search where it went wrong between 2021.4->2023.0 and update here as soon as will have progress

w3sip commented 1 year ago

@pavel-esir - you're correct, thanks. Yes, things look okay all the way up to 2021.4.2.

With 2022.0 the failure occurs -- not sure if for the same reasons as the originally reported one. I've refactored the original reproduction script to use the standard OpenVino container. Please give that a shot.

The error I'm seeing is this:

Converting input model

[ ERROR ]  offline transformations step has failed.
[ 2023-08-01 17:52:20,974 ] [ DEBUG ] [ main:551 ]  Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/openvino/tools/mo/main.py", line 533, in main
    ret_code = driver(argv)
  File "/usr/local/lib/python3.6/dist-packages/openvino/tools/mo/main.py", line 491, in driver
    ret_res = emit_ir(graph, argv)
  File "/usr/local/lib/python3.6/dist-packages/openvino/tools/mo/main.py", line 462, in emit_ir
    raise Error("offline transformations step has failed.")
openvino.tools.mo.utils.error.Error: offline transformations step has failed.

Link to the updated repro package:

https://1drv.ms/u/s!Am36O7LweZ6zgpBjapAYBepvDRckHg?e=ydFQrb

pavel-esir commented 1 year ago

Hi @w3sip, wish some minor patches i was able to solve Python part of the problem the main left problem is

Current problem is the process failing during serialization. A warning Please set version attribute for node detection_out with type=Yolov3DetectionOutput is logged, followed by the error Cannot create Yolov3DetectionOutput layer detection_out id:490 from unsupported opset: extension.

Changes i've made in reproducer: in convertToVino.sh yolov3detectionoutput.py should be copied into .../ops dir: cp $EXTDIR/yolov3detectionoutput.py $PROTODIR/../../../ops/ and corrected yolov3detectionoutput op version to 'version': 'extension'.

IR is generated in 2 steps: firstly python code parses caffe model and saves a _tmp IR https://github.com/openvinotoolkit/openvino/blob/d42b53c070b31ef3eefbebbfa92b6d5ba796bdf7/tools/mo/openvino/tools/mo/convert_impl.py#L485

after that it's loaded into IE and offline transformations which are in c++ are called upon that model object https://github.com/openvinotoolkit/openvino/blob/d42b53c070b31ef3eefbebbfa92b6d5ba796bdf7/tools/mo/openvino/tools/mo/convert_impl.py#L498 so indeed there are temporary IRs (answer to @slyalin question 4.), and the problem currently (after patching python extensions ) is solely in c++ part.

I've talked to colleges, and they pointed me to the place where c++ changes should be made. I will keep you updated

pavel-esir commented 1 year ago

hi @w3sip, we have a workaround solution: it's to use this fork for this model with extension https://github.com/openvinotoolkit/openvino/compare/master...pavel-esir:openvino:enable_custom_extension . For that you will need to build OV from sources https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/build_linux.md

Here are IRs i have generated with this w/a https://www.dropbox.com/sh/b570cpvf7dc2iz6/AADsVlCNR85mPo6lDfGpdiWSa?dl=0 I don't have c++ for you custom operations so unfortunately i cannot check if it infers correctly. Could you please check? (Since 2023.1 we by default compress IRs and bin file will be 2 times smaller, if you need uncompressed version i included it as well, but default model.[bin,xml] should be fine)

We are not ready to merge it into master because it will affect all user who load IRs from python with the new frontend api. For general solution extra effort is needed and considering that Caffe is already deprecated and will be removed in near future we would like to apply changed for Caffe only in the case of great necessity.

Can you please check the solution and say if it's ok as a workaround solution until you switch from Caffe to the newer FWs? If that's not ok we will decide from there how to make a more general fix

w3sip commented 1 year ago

Hey @pavel-esir -- I can try and work on this next week, but I don't think it's a viable long-term solution for us:

our CI toolchain is built around Intel distro of OpenVino (eg. we don't currently generate OpenVino artifacts from source)
even if we were to switch, running from a patched branch is not an ideal practice, long-term (and we'll likely be looking at 1-2 years before deprecating the Caffe models)
I may not be understanding your fix correctly, but it doesn't look Caffe specific, but rather extensions-specific?

I understand it may take some extra testing, and Caffe is deprecated -- but it isn't dead yet. Having this fix in the next mainline release would be useful, hopefully for more people than just us ;)

Thanks for your effort, and I'll give it a try next week!

andrei-kochin commented 1 year ago

@w3sip actually according to anonymously collected telemetry data caffe users is continuously decreasing and that was the reason it was deprecated on OpenVINO side. Is it possible for you to stick to specific OV version which is working without updating it further? Or use suggested code change as a patch for your CI if it works?

Indeed suggested fix is in extension handling part and looks like caffe is the only framework suffering from that. We'll check if this is applicable for other frameworks as well. If not it would likely not be merged to minimize the impact.

w3sip commented 1 year ago

@andrei-kochin - I may not have expressed my thought correctly -- there's little doubt that Caffe use has been declining for a while, and it's well on its way out. It's still here though, for the moment. Not trying to argue with your sentiment, just making sure we're understand each other correctly.

@pavel-esir - with the proposed patch, what exactly is it that I need to rebuild? MO, inference engine or both? And, if I build with the following flags:

            "ENABLE_TESTS" : "OFF",
            "ENABLE_COVERAGE" : "OFF",
            "ENABLE_OPENCV" : "OFF",
            "ENABLE_SAMPLES" : "OFF",
            "ENABLE_VPU" : "OFF",
            "ENABLE_MYRIAD" : "OFF",
            "ENABLE_CLDNN" : "ON",
            "CI_BUILD_NUMBER" : "100"

will it produce everything I need to patch the binary releases? And if so, which components will need to be patched?

While trying to build it from source, I'm encountering a problem on Windows: deprecated code causes the compilation to fail. Lots of errors like:

openvino\src\core\include\openvino/op/transpose.hpp(36): error C4996: 'ov::Node::evaluate': This method is deprecated and will be removed soon. Please use evaluate with ov::Tensor instead.
openvino\src\core\include\openvino/core/node.hpp(211): note: see declaration of 'ov::Node::evaluate'
openvino\src\core\include\openvino/op/unsqueeze.hpp(27): error C4996: 'ov::Node::evaluate': This method is deprecated and will be removed soon. Please use evaluate with ov::Tensor instead.
openvino\src\core\include\openvino/core/node.hpp(211): note: see declaration of 'ov::Node::evaluate'
openvino\src\core\include\openvino/op/variadic_split.hpp(40): error C4996: 'ov::Node::evaluate': This method is deprecated and will be removed soon. Please use evaluate with ov::Tensor instead.
openvino\src\core\include\openvino/core/node.hpp(211): note: see declaration of 'ov::Node::evaluate'

This is despite an explicit attempt to disable the error with /wd4996:

FAILED: src/common/transformations/CMakeFiles/inference_engine_transformations_obj.dir/src/transformations/common_optimizations/pad_fusion.cpp.obj 
C:\PROGRA~2\MICROS~2\2017\PROFES~1\VC\Tools\MSVC\1416~1.270\bin\Hostx64\x64\cl.exe   /TP -DIE_BUILD_POSTFIX=\"\" -DIMPLEMENT_OPENVINO_API -DPROJECT_ROOT_DIR=\"C:/.conan/2e374d/1/openvino\" -Iopenvino\src\common\transformations\include -Iopenvino\src\common\transformations\src -Iopenvino\src\core\reference\include -Iopenvino\src\core\include -Iopenvino\src\common\itt\include -Iopenvino\src\common\util\include -Iopenvino\src\core\builder\include -Iopenvino\src\core\dev_api -Iopenvino\src\frontends\common\include -Iopenvino\src\core\shape_inference\include -Iopenvino\src\common\conditional_compilation\include /wd4996  /DWIN32 /D_WINDOWS /W3 /GR /EHsc /D_CRT_SECURE_NO_WARNINGS /D_SCL_SECURE_NO_WARNINGS /EHsc /Gy /W3 /bigobj /MP /WX /wd4251 /wd4275 /O2 /Ob2 /DNDEBUG  /sdl /guard:cf -MD -std:c++14 /showIncludes /Fosrc\common\transformations\CMakeFiles\inference_engine_transformations_obj.dir\src\transformations\common_optimizations\pad_fusion.cpp.obj /Fdopenvino\bin\intel64\Release\ /FS -c openvino\src\common\transformations\src\transformations\common_optimizations\pad_fusion.cpp

Not sure why the flag is ignored.

pavel-esir commented 1 year ago

hi @w3sip thanks for your feedback!

I will answer you questions below, but first let me ask. To help you, i need to know a very important thing, do you have c++ extension for operation 'Yolov3DetectionOutput`? https://docs.openvino.ai/2021.4/openvino_docs_IE_DG_Extensibility_DG_Intro.html

Compiling from scratch is needed to check if you model is converted from Caffe to IR on my fork with the workaround fix.

But checking if IRs which i already generated is much more important. For that it's not necessary to compile openvino from scratch you can use precompiled openvino. And you will need your c++ extension for operatation 'Yolov3DetectionOutput`. Am i right that you had such extension and it worked on 2021.4?

While trying to build it from source, I'm encountering a problem on Windows:

As far as i can see you are using vs2017 MICROS~2\2017\PROFES~1\VC\Tools\MSVC but you need 2019 or higher https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/build_windows.md

with the proposed patch, what exactly is it that I need to rebuild? MO, inference engine or both?

You will need both and CPU/GPU plugin as well. But since model_optimizer target depends on ie cmake will build automatically. If you will go by option to compile i would suggest to use to build all necessary targets:

cmake --build . -t benchmark_app openvino_intel_cpu_plugin openvino_intel_cpu_plugin model_optimizer openvino_ir_frontend -- -j11

But again if you choose to build from scratch. Checking the IR i already generated is much more important. Could you please check Is it inferred correctly with you c++ extension? Do you have it https://docs.openvino.ai/2021.4/openvino_docs_HOWTO_Custom_Layers_Guide.html#custom-operations-extensions-for-the-inference-engine?

pavel-esir commented 1 year ago

@w3sip hi, do you have any updates/questions regarding my previous comment?

Did you have any chance to look into converted IRs?

Here are IRs i have generated with this w/a https://www.dropbox.com/sh/b570cpvf7dc2iz6/AADsVlCNR85mPo6lDfGpdiWSa?dl=0 I don't have c++ for you custom operations so unfortunately i cannot check if it infers correctly. Could you please check?

andrei-kochin commented 1 year ago

@w3sip do you have any more questions or news for us?

andrei-kochin commented 1 year ago

Closing for now. Please feel free to get back if suggestions from @pavel-esir works for you or not

openvinotoolkit / openvino

Problems upgrading MO 2021.4->2023.0 with a custom layer #18036