ultralytics / yolov5

YOLOv5 šŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.37k stars 16.26k forks source link

Which feature vector to use for tracking, and how to handle different vector size ? #11202

Closed danwanban closed 1 year ago

danwanban commented 1 year ago

Search before asking

Question

Hi,

My goal is to use the features extracted in the Yolo5seg-l processing to do video object tracking. As suggested, I will use the 3 C3 layers. Now, at the nms stage, anything the best result per object is filtered. If I understand correctly (not sure about this), There is no guarantee that this result will be from the deepest C3 module - It can be from the other 2. So - If I take the features from the chosen C3 layer ( 256,512 or 1024) and would like to compare it to other frame features, I need to use the same feature vector size. To sum up, my options are: (A) Mark in the nms stage the feature vector from the deepest (1024) layer that correlates to the chosen feature vector by the nms, even if this is not the one that is the result of the nms stage (best prob). Still, there is a possibility there is no such vector from the deepest layer. In such a case - which one should I use ? (B) Somehow equalize the dimensions of the selected feature vectors, if they are from different C3 layers. Should I use pooling in such a case ? Other technique ? (C) crop each item and use some other features extraction method - in this case, which generic one is recommended (quick + convertible to tensorrt) that does not require training ? (mobilenetv1/3 maybe? )

What are your thoughts on this ?

Thanks for all the help !

Additional

No response

github-actions[bot] commented 1 year ago

šŸ‘‹ Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 šŸš€ resources:

Access additional Ultralytics āš” resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 šŸš€ and Vision AI ā­!

github-actions[bot] commented 1 year ago

šŸ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO šŸš€ and Vision AI ā­

glenn-jocher commented 11 months ago

@danwanban hi there,

To ensure compatibility between feature vectors for object tracking, I'd recommend using letter (A) from your options. You can mark the feature vector from the deepest (1024) layer that correlates to the chosen feature vector by the NMS, even if it's not the one resulting from the NMS stage. In the rare case that there's no such vector from the deepest layer, you could fallback to using the nearest layer.

Maintaining consistent dimensionality for feature vectors from different layers is important. If you need to equalize dimensions, techniques like pooling can be helpful.

For rapid and trainable feature extraction that's convertible to TensorRT, MobileNetV3 might be a good choice. You can find more details on feature extraction in our Ultralytics Documentation.

I hope this information is helpful for your objectives. Let me know if there's anything else I can assist you with. Good luck with your project!