Yolo8 Grayscale conversion error

bennaa commented 6 months ago

Following the blog post at https://discuss.luxonis.com/d/3477-yolo8-grayscale-conversion-error, and following jakaskerl suggestion, I post this here.

I trained a custom grayscale Yolo8 model using the ultralytics library and I want to use it on the OAK device. The only difference in the model is that is modified to accept 1 channel, instead of 3 channels, images.

When I try to convert it from .pt to .blob usingLuxonis Tools it returns Error while converting to onnx.

I also tried to:

export the .onnx and .xml/.bin files using the ultralytics library
convert them to .blob using blobconverter

insert the blob into the pipeline as

detection_nn = pipeline.create(dai.node.YoloDetectionNetwork)
detection_nn.setBlobPath(PATH)
detection_nn.setNumClasses(8)
detection_nn.setCoordinateSize(4)
detection_nn.setAnchors([])
detection_nn.setAnchorMasks({})
detection_nn.setIouThreshold(0.5)
detection_nn.setNumInferenceThreads(2)

but when i run the pipeline it returns this error [DetectionNetwork(3)] [error] Mask is not defined for output layer with width '3549'. Define at pipeline build time using: 'setAnchorMasks' for 'side3549'. despite Yolo8 is mask less.

After some more tests, here is what I found:

I started by analyzing the code of the online tool Luxonis Tools, which expects .pt files with 3-channel inputs. The main changes I had to make are at line 58 in export_yolov8.py: im = torch.zeros(1, 3, *self.imgsz[::-1])#.to(device) # image size(1,3,320,192) BCHW iDetection

And at line 67 in exporter.py: '--reverse_input_channels '

I then checked the input and output sizes of the exported IR file using Netron. Next, I compiled the XML into a .blob file using the OpenVINO compile_tool.exe. By debugging the compile_tool code, I confirmed that the compiled blob has the correct input and output layers and sizes.

Finally, I tested adding the blob in the pipeline with different DepthAI nodes:

Using a MobileDetectionNetwork node instead of a YoloDetectionNetwork, the pipeline does not crash, but the frame rate drops to around 3-4 FPS and the bounding boxes seems random.
Using a NeuralNetwork node instead of a YoloDetectionNetwork, the pipeline does not crash, but it gets stuck on getFirstLayerFp16().

I then analyzed the source code in C++, but since it's a runtime error, I believe I need to check the code after the RPC call in DeviceBase.cpp. However, I don't think that code is available publicly, so I've run out of ideas.

I also believe it's not possible to send the neural network a fake 3-channel image that is just a view of the same grayscale channel repeated three times (e.g., using np.stack((grayscale_image,) * 3, axis=-1), which actually occupies only the memory of the grayscale image and builds a 3-channel view).

The overall changes in the blob creation seem to be very small (basically remove the hardcoded part where 3 channels were expected), do you think the changes in the sensor code could be equally minimal and could be done in one of the next release?

Thank you for your help!

HonzaCuhel commented 6 months ago

Hi @bennaa,

Thanks for the thorough description of your problem! I'll look into it. Could you please share the exported .xml, .bin and .blob files with me?

Best, Jan

bennaa commented 6 months ago

hi @HonzaCuhel,

sorry for the delay.

I cannot directly share the files here because the format is not supported, you can find the files in this drive folder.

HonzaCuhel commented 6 months ago

No problem at all. Thank you for sharing the files. I'll look into it and will get back to you as soon as I find something.

Best, Jan

HonzaCuhel commented 6 months ago

Hi @bennaa,

I'm sorry for not getting back to you sooner. I have looked into it and found several things that I want to share with you:

The .xml exported model you shared already contains decoding of the bounding boxes, which is why the NeuralNetwork node (instead of YoloDetectionNetwork) in the DAI Pipeline must be used. YoloDetectionNetwork expects the raw output from each model's channel (without the bboxes already decoded). Your model outputs a single tensor. This outputted tensor must be passed into the NonMaxSuppression operation (a variant of this) to get the final predictions.
I noticed that in the issue description in the code snippet, you are setting the number of classes to 8. However, the exported model predicts 80 classes. Was this just a typo?

I want to run the inference and possibly export the model by myself. But for that I need to ask you, could you please share with us these details about the training:

Which base model have you used for the training (e.g., Nano, Small, Medium, etc.)?
What classes should the model detect? (Could you please share an example image with some detected objects?)

Furthermore, could you please additionally share the model's weights (the .pt file) and the exported .onnx model?

Thank you.

Best, Jan

bennaa commented 5 months ago

Hi @HonzaCuhel,

I will look further into this, thank you.
It's a typo. I was trying several models and i pasted here the wrong one, sorry. The correct value for the model that I shared with you is 80.

I shared a toy sample based on Nano model and trained on coco8. I also updated the drive folder with the .pt and .onnx files. Inside the folder there is also the .yaml file with the dataset specification and classes.

HonzaCuhel commented 5 months ago

Great, thank you very much!

HonzaCuhel commented 5 months ago

Hi @bennaa,

I exported the model without the bounding box decoding part to use it with the YoloDetectionNetwork in the DepthAI pipeline. However, when I tested the exported .blob model in an application, it detected nothing. I also tried to infer both the ONNX models (your version and the one I created), but it also didn't detect anything (as shown in the attached image).

output

Have you tried the inference of the trained model? Has it detected some objects? If so, which?

You can find the exported models and the application with the DepthAI pipeline here.

Best, Jan

bennaa commented 5 months ago

Hi @HonzaCuhel,

You are right, the model I shared is somehow bugged. I updated all the files in the drive folder with a working model.

Using the .pt file I can detect objects, like in this image.

result

May I ask you how did you exported the model without the bounding box decoding part to use it with the YoloDetectionNetwork in the DepthAI pipeline?

Thanks for your support, Andrea

HonzaCuhel commented 5 months ago

Hi @bennaa,

Perfect! Now, the model detects objects (as shown in the screenshot - an inference of the newly exported ONNX model). output

I updated the exported model files in the drive folder I've shared before. I also added the Jupyter notebook that I used to convert the model. Basically, I took the code from our tools and made the same changes you did (changed the number of channels and removed the --reverse_input_channels).

If you have any additional questions, please do not hesitate to ask me. I'm more than happy to help.

Best, Jan

bennaa commented 5 months ago

Hi @HonzaCuhel,

I was testing the blob and everything works smoothly! Thank you very much for your support!

Best, Andrea

luxonis / tools

Yolo8 Grayscale conversion error #77