PyTorch_YOLOv4 - detect.py performs poorly compared with detect.py from YOLOv5 & 7; How to set optimal values for conf-thres & iou-thres

valentinitnelav commented 2 years ago

Hi @stark-t ,

Running inference with detect.py for YOLOv7 was very similar to YOLOv5. However, for PyTorch_YOLOv4, things got a bit less smooth.

This is the detection script which I just tried for PyTorch_YOLOv4.

These arguments are the same as for YOLOv7 (for which I sent you already the txt prediction files).

--img-size 640 \
--conf-thres 0.25 \
--iou-thres 0.45

PyTorch_YOLOv4 doesn't have the --nosave option, so it saves the images as well at inference time and I didn't find an argument to stop this action.

Then it has two new arguments --cfg & --names, which are not used in yolov7 or 5's detect.py

--cfg ~/PAI/detectors/PyTorch_YOLOv4/cfg/yolov4-csp-s-leaky.cfg \
--names ~/PAI/detectors/PyTorch_YOLOv4/data/pai.names \

The pai.names files must contain the label names:

Araneae
Coleoptera
Diptera
Hemiptera
Hymenoptera_Formicidae
Hymenoptera
Lepidoptera
Orthoptera

Most disturbing is that the run of detect.py produces only 238 txt prediction files on the 1680 test image files.

ls *txt | wc -l
238

Also, the .err file usually produced when running a cluster job are empty (as opposed to YOLOv5, which for each image gives info about the time needed and extra infos).

I am not sure at this point what argument to change in detect.py of PyTorch_YOLOv4 to increase the number of detections. I can reduce the values for --conf-thres & --iou-thres, but that doesn't make it comparable anymore with how I run for YOLOv7 & 5. Actually, the values above are the default values for v5 & 7. For v4 the defaults are --conf-thres 04 & --iou-thres 0.5 - see https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/detect.py Out of curiosity, I reduced these values to:

--conf-thres 0.1 \
--iou-thres 0.2 \

and I got 1443 txt prediction files for the 1680 test image files. Still a lower number than what I got for YOLOv7 with the default values (1668 txt files).

EDIT: However, I just saw that this creates too many prediction boxes per image. I will send you the results.

All in all, how do we find a comparable situation when running detect.py on the test dataset between YOLO versions?

stark-t commented 2 years ago

@valentinitnelav I really hoped we could compare the predictions with the same parameters (eg confthres, iouthres), but in the end we should compare the best possible results, which would mean we have to predict using all possible combinations.... that would be alot of work.

valentinitnelav commented 2 years ago

@stark-t , what if we define somehow universally acceptable/as objective as possible the "best" --conf-thres & --iou-thres values for each case? For example, those that minimize a loss or maximize/have a global minimum in the space of some performance metric for each YOLO version.

For each case we do a "search grid", say --conf-thres, --iou-thres in (0-1] space with a step of 0.05 (or other reasonable intervals and steps), and run detect.py each time. Then for each results folder, we pull some evaluation metric and see where in that space reaches the optimal value.

I think I can write a bash script to run detect.py hundreds of times on a GPU. But I need your help to decide on the evaluation metrics.

Would this work? Or doesn't YOLO have some suggestions on these already? I saw that one could look at the F1_curve.png (for YOLOv5 & 7) and get some optimal confidence. However, I didn't see a graph like this for yolov4.

Could be that I should use the best.pt weights instead of best_overall.pt - see https://github.com/stark-t/PAI/issues/50 Actually, I will give that one a try and see what I get, before going into more complex things.

valentinitnelav commented 2 years ago

FYI: We do not find an answer in these links below, but the general idea is that the "best" approach is to find the optimal values for these parameters. We could see them as hyperparameters at inference time, especially since these will also impact the rate of false positives on our field images.

https://github.com/ultralytics/yolov5/issues/8615#issuecomment-1188719094

https://github.com/ultralytics/yolov5/discussions/7906

valentinitnelav commented 2 years ago

Could be that I should use the best.pt weights instead of best_overall.pt - see https://github.com/stark-t/PAI/issues/50 Actually, I will give that one a try and see what I get, before going into more complex things.

So, I just run detect.py for YOLOv4 with:

--img-size 640 \
--conf-thres 0.25 \
--iou-thres 0.45

My hopes were ruined, because it produced again only 238 txt prediction files on the 1680 test image files. Similar to using 'best_overall.pt'

# Number of txt files generated
cd ~/PAI/detectors/PyTorch_YOLOv4/runs/detect/
cd 3265637_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
238

# Number of jpg files in the test dataset
cd ~/datasets/P1_Data_sampled/test/images
ls  *jpg | wc -l # this will not catch png or jpeg ones, ut 1680 is the right number
1680

I will try all the other best options and see what I get - see see https://github.com/stark-t/PAI/issues/50

valentinitnelav commented 1 year ago

Overview of weights trials using:

--img-size 640 \
--conf-thres 0.25 \
--iou-thres 0.45

Unfortunately, none of the weights options generated a number of detection txt file close to the total number of images in the test dataset. The best results were 238 out of 1680.

best.pt

Job id 3265637

# Number of txt files generated
cd ~/PAI/detectors/PyTorch_YOLOv4/runs/detect/
cd 3265637_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

best_overall.pt

Job id 3265668

# Number of txt files generated
cd ..
cd 3265668_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

best_ap50.pt

Job id 3265661

# Number of txt files generated
cd ..
cd 3265661_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

best_ap.pt

Job id 3265663

# Number of txt files generated
cd ..
cd 3265663_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

best_f.pt

Job id 3265662

# Number of txt files generated
cd ..
cd 3265662_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 92

best_p.pt

Job id 3265662

# Number of txt files generated
cd ..
cd 3265665_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 4

best_r.pt

Job id 3265666

# Number of txt files generated
cd ..
cd 3265666_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

valentinitnelav commented 1 year ago

Hi @stark-t ,

Given the results above, I think we need to do a grid search for the optimal values of conf-thres & iou-thres for each detector (YOLO 4,5,7). If you agree, then I can start working on a bash script that can run detect.py for each YOLO version , looping through the (0,1] interval for conf-thres & iou-thres

Such a script will produce a dozen of detection folders with txt label files that you can run through an evaluation script and compute performance metrics (e.g.: precision, recall, average precision, F1, IoUs). Then we can plot these values on two axis of conf-thres & iou-thres ranging from 0 to 1 and will have heat maps so that we can decide for the optimal values of conf-thres & iou-thres

Is there a simpler approach to this issue?

stark-t commented 1 year ago

@valentinitnelav maybe we can narrow down the steps from 10%, 20%, .... 90%, since we already have some insights that lower thersholds work better right? So either we just use 10%, 20%, 30% or 25%, 50%, 75% for both thresholds resulting in nine total iterations.

valentinitnelav commented 1 year ago

Hi @stark-t , should I go ahead and close all the issues related to YOLOv4 since we drop it from the results comparison pipeline? I don't think I will get more time to investigate the issues at the moment.

valentinitnelav commented 1 year ago

We don't implement YOLOv4 any longer. See also the other related issues linked above.

stark-t / PAI