Closed IamExperimenting closed 3 years ago
Hi, thanks for your questions.
In order to do multi-object tracking, DeepSORT considers 2 aspects: 1) State estimation/Kalman filtering (location/velocity of your bboxes) 2) Appearance of the pixels within your bboxes. The only "trainable" portion of this will be the model that embeds the appearance of each bbox into an appearance vector. The original repo of deepsort provides an embedder which was trained on a person re-identification dataset, aka, meant for differentiating people. In our repo, we provide (as default) a mobilenetv2 model (pre-trained on ImageNet). Theoretically, it may not be the best at differentiating between objects of the same class, but we found the performance suffice for our applications. That said, this repo is designed with modularity/flexibility such that you are able to provide your own appearance features you deem fit, meaning you can train your own model to differentiate between different IDs of vehicles if you want.
Apologies for the lack of clarify in the README, will spruce it up when I have time. For now, you can refer to the documentation in code, specifically here for your question.
Yup, that is the format update_tracks
expects.
If it helps, you can refer to our code in our other repo to see how we design an object detector inference object: https://github.com/levan92/det2/blob/master/det2/det2.py
I'll be closing this issue, we can re-open this if you have any other questions.
Hi, Firstly, thanks for this wonderful package. I have multiple questions.
Do I need to train the deep sort model for my dataset? or can I directly use your package?
I have a question, in your example, you have mentioned only the bbox
but, in the issue, you have asked the questioner to add all the arguments like bbox, conf score, label, could you please explain us a bit more clearer? https://github.com/levan92/deep_sort_realtime/issues/11#issuecomment-906134900
another question, is it mandatory to pass the bbox in [left,top,w,h] in this format? because my output from the model is [xmin, ymin, xmax, ymax]. Do you want me to convert the coordinates? ( [left,top,w,h] , confidence, detection_class)
is it possible for you to provide some practical demo?