Demo inference speed problem?

ifzhang / FairMOT

[IJCV-2021] FairMOT: On the Fairness of Detection and Re-Identification in Multi-Object Tracking

MIT License

4.01k stars 933 forks source link

Demo inference speed problem? #133

Open gachiemchiep opened 4 years ago

gachiemchiep commented 4 years ago

Hello @ifzhang Thank you for your great work. I tried your code but it didn't run that fast.

all_dla34.pth model : the speed is only 12fps : log file : all_dla34.pth.txt

python demo.py mot --load_model ../models/all_dla34.pth --conf_thres 0.4 --input-video ../videos/MOT16-03.mp4 --output-root ../

ctdet_coco_dla_2x.pth : the speed is doubled at 24fps, but there isn't any bounding box inside output video. logfile : ctdet_coco_dla_2x.pth.txt

python demo.py mot --load_model ../models/ctdet_coco_dla_2x.pth --conf_thres 0.4 --input-video ../videos/MOT16-03.mp4 --output-root ../

I'm using RTX2080Ti so I think it should archive 30fps. Do you have any idea how to archive 30fps speed. There's huge difference in fps between all_dla34.pth and ctdet_coco_dla_2x.pth models? Did you quantize the ctdet_coco_dla_2x.pth model?

ifzhang commented 4 years ago

You can try track.py to test its speed. The speed of demo is a little slower than 30 fps, like 25 fps because it needs to make the video to frames using cv2.VideoCapture, which will take some time.

ifzhang commented 4 years ago

ctdet_coco_dla_2x.pth is just a coco pretrained model of our backbone and it cannot do tracking task.

faruknane commented 4 years ago

@ifzhang Can we make batch predictions instead of predicting each image separately? I believe it would increase the performance really well. I was digging into the code. I thought I can ask it here before digging more.

It will add some delay but will increase the fps definitely.

Ashwin-Ramesh2607 commented 4 years ago

@faruknane I have not looked in deep into the code, but shouldn't batching be infeasible during tracking? Since information from the previous frame is required to track in the next frame, they cant be done in parallel right?

faruknane commented 4 years ago

@faruknane I have not looked in deep into the code, but shouldn't batching be infeasible during tracking? Since information from the previous frame is required to track in the next frame, they cant be done in parallel right?

They can be done in parallel. You can process images as one batch(batching). Then take the outputs from the AI model (whatever needed for tracking), and split the output into number of batch size (unbatching). Then feed those to the tracking system one bye one.

The authors didn't write the code for batch size greater than one. Their code processes one image at a time. But in practise, it can be paraleled once you understand the code. I did it, it works.

wangshuai66666 commented 4 years ago

They can be done in parallel. You can process images as one batch(batching). Then take the outputs from the AI model (whatever needed for tracking), and split the output into number of batch size (unbatching). Then feed those to the tracking system one bye one.

The authors didn't write the code for batch size greater than one. Their code processes one image at a time. But in practise, it can be paraleled once you understand the code. I did it, it works.

I also had problems with model processing speed.Your ideas are very good,Would you please tell me how to do it? Would you like to open source your code?

faruknane commented 4 years ago

@wangshuai66666 You should start looking at where the output of AI model is produced. There is a post-process for processing output of AI model, before tracking. You should split the batch output before sending those to post-process step.

sopsos commented 4 years ago

@faruknane Sharing the code would be ideal, but if not possible, since you tried it already: how much speedup did you get? With frame-by-frame processing (as the repo implements), my GPU Utilization from nvidia-smi is at 50%, so I don't expect less than x2 improvement. I am also using a 2080Ti as in the paper.

faruknane commented 4 years ago

@sopsos I can't share the code because I wrote it for a company. Basically there are two performance gains. One is batching the input which means giving more than one image to the model (post processing is coded for 1 image only, you should work on it to make it work for multiple images). The second gain is getting rid of unnecessary lost time by creating threads for pre, post processes and network interference. I used 3 thread and a main thread. Of course it will not give you x2 improvement. However it is still worth to write the code and test it.

yunseung-dable commented 3 years ago

@faruknane inferencing by batch is interesting idea. i am wondering if it's possible to use that strategy in real-time video