Closed danwanban closed 1 year ago
š Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 š resources:
Access additional Ultralytics ā” resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 š and Vision AI ā!
š Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO š and Vision AI ā
@danwanban hi there,
To ensure compatibility between feature vectors for object tracking, I'd recommend using letter (A) from your options. You can mark the feature vector from the deepest (1024) layer that correlates to the chosen feature vector by the NMS, even if it's not the one resulting from the NMS stage. In the rare case that there's no such vector from the deepest layer, you could fallback to using the nearest layer.
Maintaining consistent dimensionality for feature vectors from different layers is important. If you need to equalize dimensions, techniques like pooling can be helpful.
For rapid and trainable feature extraction that's convertible to TensorRT, MobileNetV3 might be a good choice. You can find more details on feature extraction in our Ultralytics Documentation.
I hope this information is helpful for your objectives. Let me know if there's anything else I can assist you with. Good luck with your project!
Search before asking
Question
Hi,
My goal is to use the features extracted in the Yolo5seg-l processing to do video object tracking. As suggested, I will use the 3 C3 layers. Now, at the nms stage, anything the best result per object is filtered. If I understand correctly (not sure about this), There is no guarantee that this result will be from the deepest C3 module - It can be from the other 2. So - If I take the features from the chosen C3 layer ( 256,512 or 1024) and would like to compare it to other frame features, I need to use the same feature vector size. To sum up, my options are: (A) Mark in the nms stage the feature vector from the deepest (1024) layer that correlates to the chosen feature vector by the nms, even if this is not the one that is the result of the nms stage (best prob). Still, there is a possibility there is no such vector from the deepest layer. In such a case - which one should I use ? (B) Somehow equalize the dimensions of the selected feature vectors, if they are from different C3 layers. Should I use pooling in such a case ? Other technique ? (C) crop each item and use some other features extraction method - in this case, which generic one is recommended (quick + convertible to tensorrt) that does not require training ? (mobilenetv1/3 maybe? )
What are your thoughts on this ?
Thanks for all the help !
Additional
No response