ktzsh / object-tracking

Multiple Object Tracking System in Keras + (Detection Network - YOLO)
94 stars 36 forks source link

Accuracy benchmark and inference speed #4

Open anuar12 opened 5 years ago

anuar12 commented 5 years ago

Hi,

Great code, thanks a lot!

I had 2 questions: 1) Accuracy Just wanted to know whether the code actually works and tracking does improve detection.

2) Speed of tracking I am looking into using MultiObjDetTracker in my project. The requirements of my project is real-time processing, would you how fast is the inference time of tracking part (excluding detection) of the MultiObjDetTracker on the GPU (1050 or 1080)? I would imagine it would <20% of the detector since it has 2 layers even though LSTM layer might take more time. I am aiming for ~10 fps.

Thanks!

ktzsh commented 5 years ago

Hi

Yes the code actually works :) But tracking in general is only as good as your detection priors. The main purpose of my research was to have a multiple object tracker that could exploit the temporal dependencies to track occluded objects. Training for both the task together was to test the hypothesis that certain features might be better for tracking. For general tracking I'd say tune your detector first on the dataset and freeze it before training the tracker part.

So, my dataset was focused on occlusion and there were lot of things I also wanted to try but hardware is a limitation for me now. Maybe we could collaborate on this one, if that's feasible. The training as I have seen is also very sensitive to the ratio of tracking and detection loss which you would need to experiment yourself. You could also try Bidirectional LSTMs.

As far as speed goes, I haven't done any rigorous analysis on that but I imagine 10fps should be easily achievable with a 1080 since it is basically a regression and does not use any RPNs which are slow. Additionally increasing temporal stride will give you better tracker speed. Imagine 3 forward passes for 4 frames with 2 forward passes for 6 frames. And also if you could parallelize the pre and post processing/decoding of frames like Yolo's original C implementation does, it should be quite easy to have 10fps.

Regards Kshitiz

anuar12 commented 5 years ago

Great, thanks for a thorough reply!

Yeah I agree that starting with a pre-trained detector and only fine-tuning the tracker would make sense. I am not 100% from the code yet but it looks like the part between detector and tracker is differentiable, so turning on the learning on the detector after finetuning the tracker would also help so that to train the whole system end-to-end.

Yep, I see your task. Occlusion is difficult. I also used SORT that uses a kalman filter (https://github.com/abewley/sort), it's very simple and fast and accuracy is pretty decent, but looking into something that can work even better. Official ROLO code was pretty darn bad, it was very hard to believe they got the results in the paper. My problem has lots of FNs detections, the images can be bad quality, bad lighting, dust everywhere, just general noise.

Also in terms of the detector, I am using Yolo v3 which has 3 output heads (I just have small objects, and v3 has 3 heads at different layers). If you want to improve your result overall, then the detector does all the hard work, so improving the detector is very important.

anuar12 commented 5 years ago

I will let you how my development goes, if anything I could submit a PR.

anuar12 commented 5 years ago

So I first wanted to train a detector KerasYOLO.py. I think I changed all the right things (anchors, labels, etc...) but I get non-sense low-confidence predictions after training, even though the loss steadily decreases. Also looking at the individual loss terms, WH term seems to be several orders of magnitude larger than the rest, I tried scaling OBJ term but I had to use extremely large values which doesn't really make sense.

Would you know what is the problem? I've used code from keras-yolo2 but not sure what changes you have made.