openvinotoolkit / open_model_zoo

Pre-trained Deep Learning models and demos (high quality and extremely fast)
https://docs.openvino.ai/latest/model_zoo.html
Apache License 2.0
4.1k stars 1.38k forks source link

Missing classes for model "person-vehicle-bike-detection-crossroad-yolov3-1020" #1695

Closed Loc-Vo closed 3 years ago

Loc-Vo commented 4 years ago

Hi,

I tried to play with the demo "object_detection_demo_yolov3_async.py" using the model "open_model_zoo/models/intel/person-vehicle-bike-detection-crossroad-yolov3-1020".

The result is NOT as I expected since most of the time it return class 2 which is CAR - even with the bike image, it boxed the wheel of the bike and again, predicted the bike as CAR.

Overall, I can see only 2 classes listed: person and car -the document mentioned that the mode does support backward compatibility but it does not seems so...

Does it because of the finetuned process?

P/S: I am new to this domain so let me know if you need any other information on the issue

eizamaliev commented 4 years ago

@LeonidBeynenson could you help?

vladimir-dudnik commented 4 years ago

@s3298230 could you please provide more details? What do you mean under fine tuning the model? Is original OMZ model (before fine tuning whatever it means) works correctly?

Loc-Vo commented 4 years ago

@s3298230 could you please provide more details? What do you mean under fine tuning the model? Is original OMZ model (before fine tuning whatever it means) works correctly?

Hi @vladimir-dudnik, Sorry for the misleading word. I mean the original model does not work correctly I got the model by the downloader and used the FP32 for testing.

Below are the log along with the images

BIKE

Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\demos\python_demos\object_detection_demo_yolov3_async>python object_detection_demo_yolov3_async.py -i .\bike2.jpg -m .\person-vehicle-bike-detection-crossroad-yolov3-1020.xml  -d CPU -r
[ INFO ] Creating Inference Engine...
[ INFO ] Loading network
[ INFO ] Preparing inputs
MFX: Unsupported extension: .\bike2.jpg
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference...
To close the application, press 'CTRL+C' here or switch to the output window and press ESC key
To switch between min_latency/user_specified modes, press TAB key in the output window
[ INFO ]  Class ID | Confidence | XMIN | YMIN | XMAX | YMAX | COLOR
[ INFO ]     2     |   0.524370 |    0 | 1250 | 1292 | 2371 | (25.0, 14, 10)
[ INFO ]
[ INFO ] Mode: USER_SPECIFIED
[ INFO ] FPS: 2.6
[ INFO ] Latency: 291.6 ms

bike2

TRAFFIC LIGHT (not work) / CAR / PERSON

[ INFO ] Creating Inference Engine...
[ INFO ] Loading network
[ INFO ] Preparing inputs
MFX: Unsupported extension: .\2.jpg
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference...
To close the application, press 'CTRL+C' here or switch to the output window and press ESC key
To switch between min_latency/user_specified modes, press TAB key in the output window
[ INFO ]  Class ID | Confidence | XMIN | YMIN | XMAX | YMAX | COLOR
[ INFO ]     2     |   0.963193 |  985 |  570 | 1078 |  625 | (25.0, 14, 10)
[ INFO ]     2     |   0.873917 | 1112 |  599 | 1166 |  632 | (25.0, 14, 10)
[ INFO ]     0     |   0.844933 |  974 |  590 |  997 |  691 | (0.0, 0, 0)
[ INFO ]     2     |   0.763559 |  160 |  536 |  437 |  661 | (25.0, 14, 10)
[ INFO ]     0     |   0.635111 |  860 |  579 |  874 |  615 | (0.0, 0, 0)
[ INFO ]
[ INFO ] Mode: USER_SPECIFIED
[ INFO ] FPS: 2.7
[ INFO ] Latency: 303.7 ms

2

vladimir-dudnik commented 4 years ago

@s3298230 according to the model description it was pre-trained on COCO dataset and then fine tuned (means re-trained) on internal dataset created specifically to cover security surveillance task for person, vehicle and bikes objects on the road. Model will work best for images taken from camera which observe road from specific location (you may see an example picture in model description).

So for images like one below image you will see that model can detect all three classes.

C:\dev\build\open-model-zoo>py_object_detection_demo_yolov3_async.bat [ INFO ] Creating Inference Engine [ INFO ] Loading network [ INFO ] Preparing inputs MFX: Unsupported extension: C:\Temp\data\000000011197.jpg [ INFO ] Loading model to the plugin [ INFO ] Starting inference [ INFO ] Class ID | Confidence | XMIN | YMIN | XMAX | YMAX | COLOR [ INFO ] person | 0.992267 | 301 | 109 | 362 | 256 | (0.0, 0, 0) [ INFO ] car | 0.972031 | 171 | 140 | 216 | 167 | (25.0, 14, 10) [ INFO ] car | 0.960856 | 401 | 153 | 488 | 189 | (25.0, 14, 10) [ INFO ] person | 0.932159 | 11 | 118 | 50 | 223 | (0.0, 0, 0) [ INFO ] car | 0.761179 | 135 | 151 | 162 | 161 | (25.0, 14, 10) [ INFO ] bicycle | 0.584321 | 555 | 148 | 598 | 191 | (12.5, 7, 5)

Loc-Vo commented 4 years ago

Thanks for your clarification @vladimir-dudnik , so this cmodel works best for those 3 classes mentioned.

However, I am confusing about the backward compatibility mentioned in the document. Should it still support the remaining 80 classes from COCO dataset or the finetuning process will somehow decrease the accuracy of the other classes?

LeonidBeynenson commented 4 years ago

to @s3298230 :
Hi Loc Vo,

First of all, I have to say that the model person-vehicle-bike-detection-crossroad-yolov3-1020 is intended to detect three classes "car", "pedestrian", "bike/motorbike" only. It returns output for other 77 classes for backward compatibility only, these remaining 77 classes will never be detected.
(The reason was as follows: Yolo V3 has very specific parsing of output, so we decided just return more classes instead of changing the parsing functions.)

The main reason of your issues, I think, is that the model was trained on specific dataset from surveillance cameras. So, the network expects to receive an image from a camera that

Obviously, if you shows a bike from this image it detects nothing, since the network expects that a person riding the bike, and a bike is very small, since the camera is 3-5 meters higher the road where the person is riding on bike, etc.

Please, note that the network cannot detect objects that it did not see during its training.

Loc-Vo commented 4 years ago

Thanks @LeonidBeynenson for the detailed explanation. I have some following questions that hopefully you can help answer: 1/Can I get the other 77 classes from the model by updating the code base - pulling out the remaining classes? 2/Will the accuracy of other classes (non-finetuned) decrease after finetune steps?

I am planning to train my own custom model starting from some classes and will append some others in the future so I really want to know if it is possible to do it , i.e: just provide the extra dataset for additional classes and perform transfer learning - instead of training using the entire dataset (old + new classes) which is time-consuming)

vladimir-dudnik commented 3 years ago

closing due to inactivity