Open gachiemchiep opened 4 years ago
You can try track.py to test its speed. The speed of demo is a little slower than 30 fps, like 25 fps because it needs to make the video to frames using cv2.VideoCapture, which will take some time.
ctdet_coco_dla_2x.pth is just a coco pretrained model of our backbone and it cannot do tracking task.
@ifzhang Can we make batch predictions instead of predicting each image separately? I believe it would increase the performance really well. I was digging into the code. I thought I can ask it here before digging more.
It will add some delay but will increase the fps definitely.
@faruknane I have not looked in deep into the code, but shouldn't batching be infeasible during tracking? Since information from the previous frame is required to track in the next frame, they cant be done in parallel right?
@faruknane I have not looked in deep into the code, but shouldn't batching be infeasible during tracking? Since information from the previous frame is required to track in the next frame, they cant be done in parallel right?
They can be done in parallel. You can process images as one batch(batching). Then take the outputs from the AI model (whatever needed for tracking), and split the output into number of batch size (unbatching). Then feed those to the tracking system one bye one.
The authors didn't write the code for batch size greater than one. Their code processes one image at a time. But in practise, it can be paraleled once you understand the code. I did it, it works.
They can be done in parallel. You can process images as one batch(batching). Then take the outputs from the AI model (whatever needed for tracking), and split the output into number of batch size (unbatching). Then feed those to the tracking system one bye one.
The authors didn't write the code for batch size greater than one. Their code processes one image at a time. But in practise, it can be paraleled once you understand the code. I did it, it works.
I also had problems with model processing speed.Your ideas are very good,Would you please tell me how to do it? Would you like to open source your code?
@wangshuai66666 You should start looking at where the output of AI model is produced. There is a post-process for processing output of AI model, before tracking. You should split the batch output before sending those to post-process step.
@faruknane Sharing the code would be ideal, but if not possible, since you tried it already: how much speedup did you get? With frame-by-frame processing (as the repo implements), my GPU Utilization from nvidia-smi is at 50%, so I don't expect less than x2 improvement. I am also using a 2080Ti as in the paper.
@sopsos I can't share the code because I wrote it for a company. Basically there are two performance gains. One is batching the input which means giving more than one image to the model (post processing is coded for 1 image only, you should work on it to make it work for multiple images). The second gain is getting rid of unnecessary lost time by creating threads for pre, post processes and network interference. I used 3 thread and a main thread. Of course it will not give you x2 improvement. However it is still worth to write the code and test it.
@faruknane inferencing by batch is interesting idea. i am wondering if it's possible to use that strategy in real-time video
Hello @ifzhang Thank you for your great work. I tried your code but it didn't run that fast.
I'm using RTX2080Ti so I think it should archive 30fps. Do you have any idea how to archive 30fps speed. There's huge difference in fps between all_dla34.pth and ctdet_coco_dla_2x.pth models? Did you quantize the ctdet_coco_dla_2x.pth model?