Closed ozayr closed 5 months ago
This is a valid point @ozayr. It highly depends on the deployment environment. Shadows/no shadows being cast in the environment, appearance from one angle may look different from another (lets say somebody carries a backpack), how objects move within the environment (they may move away from the camera) changing the amount of details that the camera can take up due to resolution...
Lets take up a specific example. If somebody enters the field of view of the camera where a heavy shadow is cast, what is visible about this person may be half a body, let's also assume that this person carries a backpack. After walking a bit, this person with full body visible may turn around facing the camera. Then the first captured embedding would not be representable of this person at all.
I'm thinking, once an object say the person with the back pack has been assigned an ID
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Dropping some ideas here:
For every n frames, we rely entirely on motion-based tracking algorithms (such as Kalman Filter, optical flow, or other predictive models) to estimate the object's location. After n frames, we generate a new embedding to revalidate the identity of the object. This approach can significantly reduce the computational load, as embedding generation is usually the most resource-intensive part in tracking. The risk here is that if the object's appearance changes significantly within those n frames (due to occlusion, lighting changes, or orientation changes), the tracker might lose accuracy.
Introduce a lightweight neural network model to perform quick re-identification checks. This network can be less complex than the main embedding generator but sufficient to catch obvious mismatches. Fallback Mechanism: If the lightweight model signals a potential mismatch, the system can fall back to generating a full embedding for a more thorough check.
Maintain a buffer of recent embeddings and motion vectors. Use these historical data points to ensure that the object’s identity remains consistent over time, reducing the need for constant re-embedding.
Each of these approaches comes with trade-offs in terms of complexity, computational savings, and potential loss of accuracy. It’s important to benchmark these methods in the specific deployment environment to understand their impact fully. Experimenting with a combination of these strategies might yield the best results in balancing efficiency and accuracy.
I have also seen that Nvidia have dropped SV3DT this is something I have been thinking about for a while, occlusions are my worsed enemy.
if one uses something like UCMC track uses to estimate the camera parameters and then track based on projections to the ground plane, this should also help with tracker accuracy assuming all objects one would like to track are confined to the same ground plane which most of the time is the case.
Yes, it would be interesting to provide the option to feed a camera configuration file in order to convert the 2D object to 3D. Then it would be possible to do motion tracking on the ground plane, which according to UCMC is more reliable.
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
just leaving this here
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Search before asking
Question
Just a thought, I wonder if it's possible to not have to generate embeddings for objects on each frame if we have some kind of certainty that this object is the object tracked from the previous frame. not sure if what im saying makes sense. but this could significantly increase the speed of the tracking part of the pipeline.