luxonis / depthai-ml-training

Some Example Neural Models that we've trained along with the training scripts
MIT License
118 stars 32 forks source link

Trained model not usable: RuntimeError: Device booted with different OpenVINO version that pipeline requires #13

Closed franva closed 2 years ago

franva commented 3 years ago

Followed this Deeplab V3 Plus Mobile Net V3 notebook

Changed nothing and just went through all procedure to see whether it can yield a re-trained AI model. It does generate an AI model and also converted the model to .blob file.

I then use this generated model with depthai project to test it on my OAK-D camera. It returns me an error:

$  python depthai_demo.py -cnn deeplabv3pmnv2 -vid /media/Workspace/Work/MyBuddy/Data/videos/boxhill_trail_back_ns.mp4 -sh 8 
Using depthai module from:  /media/Workspace/Learning/Github/depthai/myvenv/lib/python3.8/site-packages/depthai.cpython-38-x86_64-linux-gnu.so
Depthai version installed:  2.7.0.0
Available devices:
[0] 14442C10013762D700 [X_LINK_UNBOOTED]
Enabling low-bandwidth mode due to low USB speed... (speed: UsbSpeed.HIGH)
Traceback (most recent call last):
  File "/home/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/.vscode/extensions/ms-python.python-2021.8.1147840270/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/home/.vscode/extensions/ms-python.python-2021.8.1147840270/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/home/.vscode/extensions/ms-python.python-2021.8.1147840270/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/home/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/media/Workspace/Learning/Github/depthai/depthai_demo.py", line 184, in <module>
    device.startPipeline(pm.p)
RuntimeError: Device booted with different OpenVINO version that pipeline requires

I then downgraded my depthai to 2.7.0.0 and it throws another error: image

I then updated the requirements.txt to have depthai==2.7.0.0

Then re-ran the command line and the same error happens again RuntimeError: Device booted with different OpenVINO version that pipeline requires

VanDavv commented 3 years ago

Hi @franva, thanks for the report!

Could you add openvino_version key to your model config? Just like we specify it in openpose2 nn - https://github.com/luxonis/depthai/blob/main/resources/nn/openpose2/openpose2.json#L6

IIRC the version in the notebooks is 2020.2, so most likely you'd have to set this value to 2020_2

franva commented 3 years ago

Hi @VanDavv thanks for the reply.

Downgraded the depthai to 2.7.0.0 and added the openvino_version

{
    "nn_config": {
        "output_format" : "raw",
        "input_size": "256x256"
    },
    "openvino_version": "2020_2",
    "handler": "handler.py"
}

Then I still get error:

image

I feel the approach is not correct: in order to use the custom trained model, we need to downgrade depthai and also downgrade openvino.

Could we have a proper fix for the training notebook?

Thanks

VanDavv commented 3 years ago

Sure, will take a look at it, as it's not allowing us to run this model on the latest demo. In the meantime, could you try to run the demo script on v2.7.2.0 using the following code?

git checkout tags/v2.7.2.0 -b deeplab_nn_2720
pip install depthai==2.7.2.0
python depthai_demo.py -cnn deeplabv3pmnv2 -vid ...

It will checkout the demo script version that was based on 2.7.2.0 - will miss some improvements that are in the latest main, but could allow you to run this network while the notebook is being fixed

Erol444 commented 3 years ago

This trained blob should be used in combination wtih this demo script: https://github.com/luxonis/depthai-experiments/tree/master/gen2-deeplabv3_multiclass Could you try that out @franva? Thanks!

franva commented 3 years ago

@VanDavv Tried, the code finally runs, but....... the segmentation doesn't look correct:

image

here is my code:

handler.py

import cv2
import numpy as np

from depthai_helpers.managers import Previews
from depthai_helpers.utils import to_tensor_result

def decode(nn_manager, packet):
    [print(f"Layer name: {l.name}, Type: {l.dataType}, Dimensions: {l.dims}") for l in packet.getAllLayers()]

    # Layer name: Cast_2, Type: DataType.INT, Dimensions: [1, 256, 256]
    # Layer name: SemanticProbabilities, Type: DataType.FP16, Dimensions: [1, 21, 256, 256]
    data = np.squeeze(to_tensor_result(packet)["SemanticProbabilities"])
    class_colors = [[0,0,0],[255,0,0],[0,255,0],[0,0,255]]
    class_colors = np.asarray(class_colors, dtype=np.uint8)

    indices = np.argmax(data, axis=0) // 4
    output_colors = np.take(class_colors, indices, axis=0)
    return output_colors

def draw(nn_manager, data, frames):
    if len(data) == 0:
        return

    for name, frame in frames:
        if name == "color" and nn_manager.source == "color" and not nn_manager.full_fov:
            scale_factor = frame.shape[0] / nn_manager.input_size[1]
            resize_w = int(nn_manager.input_size[0] * scale_factor)
            resized = cv2.resize(data, (resize_w, frame.shape[0])).astype(data.dtype)
            offset_w = int(frame.shape[1] - nn_manager.input_size[0] * scale_factor) // 2
            tail_w = frame.shape[1] - offset_w - resize_w
            stacked = np.hstack((np.zeros((frame.shape[0], offset_w, 3)).astype(resized.dtype), resized, np.zeros((frame.shape[0], tail_w, 3)).astype(resized.dtype)))
            cv2.addWeighted(frame, 1, stacked, 0.2, 0, frame)
        elif name in (Previews.color.name, Previews.nn_input.name, "host"):
            cv2.addWeighted(frame, 1, cv2.resize(data, frame.shape[:2][::-1]), 0.2, 0, frame)

the json file:

{
    "nn_config": {
        "output_format" : "raw",
        "input_size": "256x256"
    },
    "openvino_version":"2020_2",
    "handler": "handler.py"
}

You might wonder why indices = np.argmax(data, axis=0) // 4

That's because I might be able to create 21 different colors, but as I mentioned in another issue, if the color is not "pure color", then it's almost invisible. So I just make allocate 4 different colors to all 21 categories.

franva commented 3 years ago

This trained blob should be used in combination wtih this demo script: https://github.com/luxonis/depthai-experiments/tree/master/gen2-deeplabv3_multiclass Could you try that out @franva? Thanks!

sure. will update you soon

franva commented 3 years ago

hi @Erol444

The demo script: https://github.com/luxonis/depthai-experiments/tree/master/gen2-deeplabv3_multiclass works.

But I didn't use my trained model, the model I used in the demo script is the deeplab_v3_plus_mnv2_decoder_256_openvino_2020.2.blob

Still, I think it's very important to have the consistency in terms of package version.

Use the latest code to work with provided models and then downgrade in order to use the custom trained model, this doesn't sound correct.

VanDavv commented 3 years ago

@franva I tried to convert the model using OpenVINO 2021.4 and it yields errors, with a fatal one during myriad_compile where it says

Check 'element::Type::merge(inputs_et, inputs_et, get_input_element_type(i))' failed at core/src/op/concat.cpp:50:
While validating node
'v0::Concat Concat_1006 (
  strided_slice_10/stack_1/Unsqueeze[0]:i32{1},
  strided_slice_10/stack_1/Unsqueeze1077[0]:i32{1},
  strided_slice_10/stack_1/Unsqueeze1079[0]:i32{1},
  strided_slice_10/extend_end_const1245431153[0]:i64{1}
) -> ()'
with friendly_name 'Concat_1006':
Argument element types are inconsistent.

So to update this notebook, we'd have to dig deeper into the network itself. The first thing I'd try is to check PINTO model zoo, as he has this network in OpenVINO format too - https://github.com/PINTO0309/OpenVINO-DeeplabV3.

For reference, I posted this updated notebook here

tersekmatija commented 3 years ago

Still, I think it's very important to have the consistency in terms of package version.

Use the latest code to work with provided models and then downgrade in order to use the custom trained model, this doesn't sound correct.

100% agree. I apologize for the confusion. I created a pull request that should work with the newest version. However, since OpenVINO 2020.3, you have to edit the .xml before creating a .blob.

franva commented 3 years ago

@franva I tried to convert the model using OpenVINO 2021.4 and it yields errors, with a fatal one during myriad_compile where it says

Check 'element::Type::merge(inputs_et, inputs_et, get_input_element_type(i))' failed at core/src/op/concat.cpp:50:
While validating node
'v0::Concat Concat_1006 (
  strided_slice_10/stack_1/Unsqueeze[0]:i32{1},
  strided_slice_10/stack_1/Unsqueeze1077[0]:i32{1},
  strided_slice_10/stack_1/Unsqueeze1079[0]:i32{1},
  strided_slice_10/extend_end_const1245431153[0]:i64{1}
) -> ()'
with friendly_name 'Concat_1006':
Argument element types are inconsistent.

So to update this notebook, we'd have to dig deeper into the network itself. The first thing I'd try is to check PINTO model zoo, as he has this network in OpenVINO format too - https://github.com/PINTO0309/OpenVINO-DeeplabV3.

For reference, I posted this updated notebook here

Thanks @VanDavv .

I understand your company is adding huge amazing features at the moment which is great~! But if users could not customize models to have their own, then they were limited to what they were provided and this in turn limited the usefulness of the hardware.

I have had a look at PINTO model zoo, that repo uses out of date TF e.g. Tensorflow-GPU v1.11.0. Normally users would like to catch up the latest version of packages. Also understood that business cannot keep up with the latest version and this is common. But there should be some middle point where both sides meet each other. For now, TF has been on version 2 for years and hopefully our notebook could at least catch up with the major version.

Once again, thanks for your reply.

franva commented 3 years ago

Still, I think it's very important to have the consistency in terms of package version. Use the latest code to work with provided models and then downgrade in order to use the custom trained model, this doesn't sound correct.

100% agree. I apologize for the confusion. I created a pull request that should work with the newest version. However, since OpenVINO 2020.3, you have to edit the .xml before creating a .blob.

Hi @tersekmatija , great to see you created the notebook for the latest version~!!! Did you mean the latest version for TF and DepthAI?

Also what content should we edit the .xml? Could you please create a demo? Appreciated

tersekmatija commented 3 years ago

Did you mean the latest version for TF and DepthAI?

@franva I meant OpenVINO (now 2021.4 instead of 2020.2) and DepthAI (now 2.9.0.0 instead of 2.7.2.0). This allows you to use the latest DepthAI version. As for the tensorflow, the version in the Colab is still 1.x, as the model is taken directly from https://github.com/tensorflow/models/tree/master/research, who use the TF 1.x for Deeplab.

Also what content should we edit the .xml?

Before converting to .blob you should search for this layer in the XML and change the element_type to i32 instead of i64. Precision should be left as it is.

<layer id="490" name="strided_slice_10/extend_end_const1245431561" type="Const" version="opset1">
    <data element_type="i64" offset="924018" shape="1" size="8"/>
    <output>
        <port id="0" precision="I64">
            <dim>1</dim>
        </port>
    </output>
</layer>

I don't know why, but as can be seen in the error message provided by @VanDavv, element types are inconsistent since OpenVINO 2020.3. That is why the initial fix used OpenVINO version 2020.2.

Could you please create a demo? Appreciated

I added the code for editing XML in the latest pull request and should be available in the Colab once the pull request is merged, but you can also do it manually.

franva commented 3 years ago

@tersekmatija It will be great to see the updated notebook using the latest version for both DepthAI and OpenVINO.

Could you please let me know once that notebook is available?

Cheers,

Winston

franva commented 3 years ago

Did you mean the latest version for TF and DepthAI?

@franva I meant OpenVINO (now 2021.4 instead of 2020.2) and DepthAI (now 2.9.0.0 instead of 2.7.2.0). This allows you to use the latest DepthAI version. As for the tensorflow, the version in the Colab is still 1.x, as the model is taken directly from https://github.com/tensorflow/models/tree/master/research, who use the TF 1.x for Deeplab.

Also what content should we edit the .xml?

Before converting to .blob you should search for this layer in the XML and change the element_type to i32 instead of i64. Precision should be left as it is.

<layer id="490" name="strided_slice_10/extend_end_const1245431561" type="Const" version="opset1">
  <data element_type="i64" offset="924018" shape="1" size="8"/>
  <output>
      <port id="0" precision="I64">
          <dim>1</dim>
      </port>
  </output>
</layer>

I don't know why, but as can be seen in the error message provided by @VanDavv, element types are inconsistent since OpenVINO 2020.3. That is why the initial fix used OpenVINO version 2020.2.

Could you please create a demo? Appreciated

I added the code for editing XML in the latest pull request and should be available in the Colab once the pull request is merged, but you can also do it manually.

Hi @tersekmatija are you talking about just 1 particular layer?

While I searched for the element_type="i64" , I got 44 results in the .xml file.

Should I change all of them??

franva commented 3 years ago

hi @tersekmatija I changed those i64 to i32 and then the blob converter errored out:

image

But if I don't change those i64 then I can get the converted blob.

I will try whether the converted blob works or not.

Erol444 commented 3 years ago

Hello @franva , I believe he is talking about the specific layer named strided_slice_10/extend_end_const1245431561. We have just merged (to master) the ml training notebook changes where the XML is edited (programmatically), PR here: https://github.com/luxonis/depthai-ml-training/pull/14

tersekmatija commented 3 years ago

Hi @franva , yes I am talking about just 1 particular layer as @Erol444 said. See the PR he linked. It contains the Colab with the latest OpenVINO 2021.4. It also edits the XML programmatically. The obtained blob can be then run with the latest DepthAI 2.9.0.0.

franva commented 3 years ago

Thanks @Erol444 and @tersekmatija for the update :)

That particular 1 layer is just for the Deeplabv3 architecture, correct?

What about other networks? How are we going to know what layers need to be changed in a custom model?

Erol444 commented 3 years ago

@franva it's just for Deeplabv3, correct. We haven't experienced this issue in any other networks, hopefully openvino will be fully backwards compatible in the future.