Performed well during training, but had poor performance in using webcam to predict the object pose

Tinywolf007 commented 1 year ago

During training, I use --validation-image-save-path to save the predicted validation images. Such as

2023-07-10 16-28-09 的屏幕截图

The model did a good job, however while I used inference_webcam.py to predict the object pose the performance was really terrible.

2023-07-10 16-27-10 的屏幕截图

So, how can I solve this problem?

ybkscht commented 1 year ago

Hi @Tinywolf007, it's hard to identify the problem without further information because there could be several possible issues. Probably the easiest one to fix is to check if you had adjusted the intrinsic camera matrix in the inference webcam script to match the webcam you use? Depending on your dataset it is also possible to be an overfitting issue. So how big is your dataset, how much variations it contains and got the model problems when the object is not on the marker board because it never saw it without the board? To check if this is the case you could try to set the object on the marker board like in the dataset while running the webcam inferencing and see if it gets better or not. If this is the case I think it would be beneficial to use some background augmentation during training if you have masks (replacing the background with some random images).

Tinywolf007 commented 1 year ago

Thanks for your reply. I have changed the camera matrix in the inference file. However, I made the Linemod Dataset by using Realsense d435 camera, and I run the inference file by using module camera. I don't know if that will affect the inference result.

In addition, there are only 1000 images in my dataset, I think the model isn't fully trained, but the ADD is very close to 1.

ybkscht commented 1 year ago

Yes, if you use a different camera for training and inferencing it will probably affect your results depending on how much the camera intrinsics differ. So I would recommend using the same camera.

Regarding your concerns about the model isn't fully trained - did you also check some other metrics during training, like rotation and translation errors? Because from my experience the ADD metric is often easier for larger objects because the threshold depends on the objects diameter (usually 10% of the diameter).

ybkscht / EfficientPose

Performed well during training, but had poor performance in using webcam to predict the object pose #60