isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
http://www.isarsoft.com
Other
277 stars 63 forks source link

Postprocessing Issue #9

Closed ontheway16 closed 3 years ago

ontheway16 commented 3 years ago

Hi, After custom yolov4 model trained to 10000 epoch (map displayed as 6% at that epoch) in original YoloV4, I have recreated all necessary files and run your client.py on a custom image. At the same time, I run original Yolov4 detector with same 10K weights, on above image. Yolov4 detector made 2 individual detections (62% and 49% with default threshold, an additional one with 9%, at 0.05 thresh.). client.py made 3 individual detections of same objects, including the last one above.

1- I will let it train up to 40K and retry, but may there be a difference in NMS stuff ? 2- Also, It would be very good if client.py could show confidence scores both on output image and terminal. 3- Is it possible to let client.py show custom names on detections, instead of coco labels? 4- One small thing I noticed about client.py, after closing the image, its not returning to prompt and quits manually with Ctrl+Z.

Alp

philipp-schmidt commented 3 years ago

Hi, yes I noticed this too. In the dog.jpg image it detects a "pottedplant", which it should not do.

1- There is definitely still a mistake in the postprocessing code. It has something to do with the difference between class_score and detection_score. One of them is the objectness and the other is the actual detection score and i think they are mixed up at some point. But as this might not be the only mistake, a full mAP test implementation would be nice. If you have time maybe you can help? I will only be able to start working on this earliest next week. Look at #7 for more info 2- That's easy I will add it next time. It is stored here 3- Yes, have a look in this file 4- Oh yes I had that multiple times and it is very annoying.... It's opencv behaviour and I found no fix so far, except closing the image automatically after time which is not what you want either...

ontheway16 commented 3 years ago

I would like to help definitely, i just wanted to be more confident with the usage and ensure i was able to make it working with custom dataset. I can take a look into it next week. A few more questions about it then;

  1. is there any way to set a nms threshold, as in the orig. Yolo?
  2. is it applicable to set model parameters via triton model configuration? Since this is plugin based, not sure if Triton model config stuff (config.pbtxt i assume?) works here.
  3. I am still trying to figure out the output dims for yolo. input is (608, 608, 3) and this is ok but, could you please explain what's the output numbers? (ie, 7001, something, ..). Related to number of maximum no. of detections per image?
  4. Actually the client script may save the jpeg file instead of displaying it or as you say, it may display for a fixed seconds. But of course the problem is with video streams.
philipp-schmidt commented 3 years ago
  1. Yes -> here
  2. What parameters and config are we talking about? Example? The plugin only adds an implementation for a special layer, anything else stays the same.
  3. Look into the plugin code. Usually yolo has 3 output layers (3 yolo layers) and earlier you had to postprocess those layers yourself to get the detection boxes. Now the plugin does that for you. The output is 7001 floats, where the very first float is the number of detected boxes in the tensor and then there are n * (7 floats) being (x,y,w,h,det_score,label,class_score). The last thing is where the postprocessing currently is probably bugged.
  4. The video can be saved as well.
ontheway16 commented 3 years ago

The parameters I mentioned is the ones in here. Bu apparently, in your case, its gathering these parameters automatically.

Also, since I do not have FP16 capability, I am assuming all the inference carried out as FP32. Could it be a source for differences? Just an opinion.

philipp-schmidt commented 3 years ago

I'm using the --strict-model-config=false flag so I don't have to supply that config file.

Also:

TensorRT Plan models do not require a model configuration file because Triton can derive all the required settings automatically.

That being said there is no limitation at all in this repo to use this config file explicitly if you need to enable a specific triton feature.

FP32 vs FP16

Yes there will be minor differences. But the differences we are seeing are due to a bug in the postprocessing. Hence the bug label and pinning of this thread.

Can you open new issues for questions regarding new topics e.g. config? I would like to keep this issue here for discussions regarding the postprocessing code. Did you have a chance to double check my implementation?

ontheway16 commented 3 years ago

I will check the updates, though client.py was working well with recent update. I agree about new issues, will create new topics.

philipp-schmidt commented 3 years ago

We need to double check and cross-reference this part of the code:

https://github.com/isarsoft/yolov4-triton-tensorrt/blob/5f916f9b20fdd2a184a5e0c3b1741ff17a1f94eb/clients/python/processing.py#L46-L72

ontheway16 commented 3 years ago

Are there any other yolo implementations for tensorRT? it might be wise to include their results, in the comparison to track the guilty pieces in the code.

ontheway16 commented 3 years ago

Ok I have further trained my custom model (18K epochs) and visually compared the results. As far as I can see, results are on par with original yoloV4, BB coverages are slightly different on larger (>500pixels) objects but thats fine. made several crops at different resolutions of infer images, to find out best performing res.; with 608x608, 704x704, 800x, 864x, 928x, 992x, 1024x, 1152x, 1216x, and 1536x square crops.

All in all, with 608x608 trained model, naturally the best is 608, and 1024 (which is interestingly 32x32), seems best performing for anything over 608. 608 onwards, confidence consistently lowers and It immediately starts to fail with false negatives, at 1056x1056, in both repo.

In some certain cases, I observed double BB over same object with your repo, while single BB recorded in original v4. In my tests, results (detection and BB localization) were highly similar for both, and in some cases, your repo were able to detect some very obvious and clearly located objects while original fails. Yolov4 reports around 140-150ms for a 1024 image. It might be good to see a similar output from client.py, too.

philipp-schmidt commented 3 years ago

Postprocessing is now fixed in v1.3.0. The confidence calculation was wrong. Client now displays confidences on stdout and I get the same confidences for default yolov4 on a few selected input images.