Open AbinashSankaran opened 1 year ago
Hi @AbinashSankaran! Great to hear you are considering using Norfair for this exciting project!
As you know, multiple object tracking in sports is challenging and requires significant effort to make it work. But the scenario you describe (10 to 40 seconds videos) is one where you could get good results.
To get a sense of the current state of your tracking output, can you share a video where we can see the problems? With that, I can suggest some specific ideas to help you solve your problems. Or at least try.
Hi @facundo-lezama , here are some of the videos i have generated with Norfair I am not able to upload the 10 second clips here, as GitHub will only allow upto 10 MB. So attaching the video links here
Video 1: Just the Tracker (distance_thresh=0.7, iou) => As you can see, when the players cross, the id 11 and id 1 are wrongly assigned here
Video 2: Tracker with Camera Motion (The track id is assigned multiple times within a second) here
Video 3: Tracker with ReID. Again the same issue, ids are wrongly assigned as they cross. here
It would be greateful if atleast we could remove that id getting assigned to a wrong player issue and able to keep the same id as long as possible.
PS: So far the naive tracker woks great. But the ID issue is very much concerning. Also, normal yolo model will work fine though we have used custom model to detect only players.
Hi @AbinashSankaran, I have conducted several experiments and would like to share the results with you, which you may find useful for your own trials.
For these experiments, I fine-tuned a YOLOv8 model using a limited set of examples to detect the classes that you specified.
Below are the tests I performed:
Norfair stock (excluding ReID or camera motion): I employed iou
as the distance_function
and set 0.8
as the distance_threshold
across all experiments. The results can be viewed here. Upon comparison with the video you uploaded, I noticed some enhancements. It may be beneficial to reassess the detector.
Norfair with camera motion: There was no noticeable improvement in comparison to the stock experiment. The camera movements were quite subtle, suggesting that this feature may not be necessary in this context. The results can be found here. The only modification made was the min_distance
, which was set to 7
. The code was derived from the camera motion demo. Please remember that it's crucial to mask the detections as demonstrated in our demo. In your videos, it's equally important to mask the channel logo and the scoreboard, as these elements remain static throughout the video. We aim to exclude points from these areas to improve the accuracy of the camera motion estimations.
Norfair with ReID: Although we do not currently have a user guide, I can detail the methods we employed and provide a video to demonstrate a potential outcome of using the ReID feature with embeddings. We utilized this repository, where a model for ReID can be fine-tuned. The repository includes a model zoo section with pre-trained models across various domains. In this instance, we used the ResNet50 trained with the Market1501 dataset. We fine-tuned this model for several players, the referee, and the goalie. The results indicated improvements and maintained consistent IDs for the referee and goalie. However, there is significant room for enhancement. The solution is not robust enough to retain the same ID for players, and we believe that if you do not require an online tracker (such as Norfair, which necessitates an ID for each person in every frame), better results can be achieved by analyzing the entire video prior to returning the results. The primary reason is that the player's number is not always visible and players often resemble each other. Additionally, it is likely that using the position of each player could further improve the results and some other rules that you know about this domain.
Here is the detector that I've fine-tuned, should you wish to replicate the results obtained from the first two alternatives.
We can delve deeper into each alternative based on your feedback on the performance of the videos I sent you, and explore ways to enhance your results. Please let me know your thoughts.
Hi Diego, thanks for taking the time to do the experiments. We will try to incorporate the suggestions in our system. But i have a question, regarding the fine-tuned detector that you have used. Have you trained yolov8 model using manual annotations? Or finetuned model with the weights that i have provided? If so, what are the steps you have taken?
Thanks, Abinash
With Regards, Abinash S
Abinash S | Engineer Presidio | presidio.com 1st Floor, Tower C, India Land Tech Park Pvt Ltd KGISL Infrastructures Pvt Ltd, Keeranatham Village, Saravanampatti, Coimbatore, Tamil Nadu 641035 @.***
From: Diego Fernández @.> Sent: Wednesday, July 26, 2023 8:08:36 PM To: tryolabs/norfair @.> Cc: S, Abinash @.>; Mention @.> Subject: Re: [tryolabs/norfair] Suggestions for Tracking Players in Hockey (Issue #271)
EXTERNAL EMAIL Diego Fernández @.>, Reply-To: tryolabs/norfair @.>
Hi @AbinashSankaranhttps://github.com/AbinashSankaran, I have conducted several experiments and would like to share the results with you, which you may find useful for your own trials.
For these experiments, I fine-tuned a YOLOv8 model using a limited set of examples to detect the classes that you specified.
Below are the tests I performed:
Norfair stock (excluding ReID or camera motion): I employed iou as the distance_function and set 0.8 as the distance_threshold across all experiments. The results can be viewed herehttps://drive.google.com/file/d/10RgFwlvnyhSu7OMAUS7Ade1kVyEFkngp/view?usp=drive_link. Upon comparison with the video you uploaded, I noticed some enhancements. It may be beneficial to reassess the detector.
Norfair with camera motion: There was no noticeable improvement in comparison to the stock experiment. The camera movements were quite subtle, suggesting that this feature may not be necessary in this context. The results can be found herehttps://drive.google.com/file/d/1TanKwqrLHBCx6PfLIs_h-9PXxFi7G9pa/view?usp=drive_link. The only modification made was the min_distance, which was set to 7. The code was derived from the camera motion demo. Please remember that it's crucial to mask the detections as demonstrated in our demo. In your videos, it's equally important to mask the channel logo and the scoreboard, as these elements remain static throughout the video. We aim to exclude points from these areas to improve the accuracy of the camera motion estimations.
Norfair with ReID: Although we do not currently have a user guide, I can detail the methods we employed and provide a video to demonstrate a potential outcome of using the ReID feature with embeddings. We utilized thishttps://github.com/JDAI-CV/fast-reid repository, where a model for ReID can be fine-tuned. The repository includes a model zoo section with pre-trained models across various domains. In this instance, we used the ResNet50 trained with the Market1501 dataset. We fine-tuned this model for several players, the referee, and the goalie. The resultshttps://drive.google.com/file/d/1RJHr4A-ufsye7na4CpRDSBuG--PLgeDI/view?usp=drive_link indicated improvements and maintained consistent IDs for the referee and goalie. However, there is significant room for enhancement. The solution is not robust enough to retain the same ID for players, and we believe that if you do not require an online tracker (such as Norfair, which necessitates an ID for each person in every frame), better results can be achieved by analyzing the entire video prior to returning the results. The primary reason is that the player's number is not always visible and players often resemble each other. Additionally, it is likely that using the position of each player could further improve the results and some other rules that you know about this domain.
Herehttps://drive.google.com/file/d/1lefd4rbQiIROZOByo2fcvDUb_1-QvIPg/view?usp=drive_link is the detector that I've fine-tuned, should you wish to replicate the results obtained from the first two alternatives.
We can delve deeper into each alternative based on your feedback on the performance of the videos I sent you, and explore ways to enhance your results. Please let me know your thoughts.
— Reply to this email directly, view it on GitHubhttps://github.com/tryolabs/norfair/issues/271#issuecomment-1651944637, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYRM674BEYHIEMHFMXI7WP3XSETWZANCNFSM6AAAAAA2MEFBII. You are receiving this because you were mentioned.Message ID: @.***>
This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary. If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments. Please be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited.
Yes, I fine-tuned a YOLOv8 model using a small dataset that I constructed from your video, which can be found here. The initial model used for this process was the pre-trained version provided by Ultralytics, specifically, I believe it was the nano version.
Hi @AbinashSankaran, I have conducted several experiments and would like to share the results with you, which you may find useful for your own trials.
For these experiments, I fine-tuned a YOLOv8 model using a limited set of examples to detect the classes that you specified.
Below are the tests I performed:
- Norfair stock (excluding ReID or camera motion): I employed
iou
as thedistance_function
and set0.8
as thedistance_threshold
across all experiments. The results can be viewed here. Upon comparison with the video you uploaded, I noticed some enhancements. It may be beneficial to reassess the detector.- Norfair with camera motion: There was no noticeable improvement in comparison to the stock experiment. The camera movements were quite subtle, suggesting that this feature may not be necessary in this context. The results can be found here. The only modification made was the
min_distance
, which was set to7
. The code was derived from the camera motion demo. Please remember that it's crucial to mask the detections as demonstrated in our demo. In your videos, it's equally important to mask the channel logo and the scoreboard, as these elements remain static throughout the video. We aim to exclude points from these areas to improve the accuracy of the camera motion estimations.- Norfair with ReID: Although we do not currently have a user guide, I can detail the methods we employed and provide a video to demonstrate a potential outcome of using the ReID feature with embeddings. We utilized this repository, where a model for ReID can be fine-tuned. The repository includes a model zoo section with pre-trained models across various domains. In this instance, we used the ResNet50 trained with the Market1501 dataset. We fine-tuned this model for several players, the referee, and the goalie. The results indicated improvements and maintained consistent IDs for the referee and goalie. However, there is significant room for enhancement. The solution is not robust enough to retain the same ID for players, and we believe that if you do not require an online tracker (such as Norfair, which necessitates an ID for each person in every frame), better results can be achieved by analyzing the entire video prior to returning the results. The primary reason is that the player's number is not always visible and players often resemble each other. Additionally, it is likely that using the position of each player could further improve the results and some other rules that you know about this domain.
Here is the detector that I've fine-tuned, should you wish to replicate the results obtained from the first two alternatives.
We can delve deeper into each alternative based on your feedback on the performance of the videos I sent you, and explore ways to enhance your results. Please let me know your thoughts.
Hi @AbinashSankaran, I have conducted several experiments and would like to share the results with you, which you may find useful for your own trials.
For these experiments, I fine-tuned a YOLOv8 model using a limited set of examples to detect the classes that you specified.
Below are the tests I performed:
- Norfair stock (excluding ReID or camera motion): I employed
iou
as thedistance_function
and set0.8
as thedistance_threshold
across all experiments. The results can be viewed here. Upon comparison with the video you uploaded, I noticed some enhancements. It may be beneficial to reassess the detector.- Norfair with camera motion: There was no noticeable improvement in comparison to the stock experiment. The camera movements were quite subtle, suggesting that this feature may not be necessary in this context. The results can be found here. The only modification made was the
min_distance
, which was set to7
. The code was derived from the camera motion demo. Please remember that it's crucial to mask the detections as demonstrated in our demo. In your videos, it's equally important to mask the channel logo and the scoreboard, as these elements remain static throughout the video. We aim to exclude points from these areas to improve the accuracy of the camera motion estimations.- Norfair with ReID: Although we do not currently have a user guide, I can detail the methods we employed and provide a video to demonstrate a potential outcome of using the ReID feature with embeddings. We utilized this repository, where a model for ReID can be fine-tuned. The repository includes a model zoo section with pre-trained models across various domains. In this instance, we used the ResNet50 trained with the Market1501 dataset. We fine-tuned this model for several players, the referee, and the goalie. The results indicated improvements and maintained consistent IDs for the referee and goalie. However, there is significant room for enhancement. The solution is not robust enough to retain the same ID for players, and we believe that if you do not require an online tracker (such as Norfair, which necessitates an ID for each person in every frame), better results can be achieved by analyzing the entire video prior to returning the results. The primary reason is that the player's number is not always visible and players often resemble each other. Additionally, it is likely that using the position of each player could further improve the results and some other rules that you know about this domain.
Here is the detector that I've fine-tuned, should you wish to replicate the results obtained from the first two alternatives.
We can delve deeper into each alternative based on your feedback on the performance of the videos I sent you, and explore ways to enhance your results. Please let me know your thoughts.
Do you example code where you have used ReID with Norfair? Thanks
Hi @AbinashSankaran, I have conducted several experiments and would like to share the results with you, which you may find useful for your own trials.
For these experiments, I fine-tuned a YOLOv8 model using a limited set of examples to detect the classes that you specified.
Below are the tests I performed:
- Norfair stock (excluding ReID or camera motion): I employed
iou
as thedistance_function
and set0.8
as thedistance_threshold
across all experiments. The results can be viewed here. Upon comparison with the video you uploaded, I noticed some enhancements. It may be beneficial to reassess the detector.- Norfair with camera motion: There was no noticeable improvement in comparison to the stock experiment. The camera movements were quite subtle, suggesting that this feature may not be necessary in this context. The results can be found here. The only modification made was the
min_distance
, which was set to7
. The code was derived from the camera motion demo. Please remember that it's crucial to mask the detections as demonstrated in our demo. In your videos, it's equally important to mask the channel logo and the scoreboard, as these elements remain static throughout the video. We aim to exclude points from these areas to improve the accuracy of the camera motion estimations.- Norfair with ReID: Although we do not currently have a user guide, I can detail the methods we employed and provide a video to demonstrate a potential outcome of using the ReID feature with embeddings. We utilized this repository, where a model for ReID can be fine-tuned. The repository includes a model zoo section with pre-trained models across various domains. In this instance, we used the ResNet50 trained with the Market1501 dataset. We fine-tuned this model for several players, the referee, and the goalie. The results indicated improvements and maintained consistent IDs for the referee and goalie. However, there is significant room for enhancement. The solution is not robust enough to retain the same ID for players, and we believe that if you do not require an online tracker (such as Norfair, which necessitates an ID for each person in every frame), better results can be achieved by analyzing the entire video prior to returning the results. The primary reason is that the player's number is not always visible and players often resemble each other. Additionally, it is likely that using the position of each player could further improve the results and some other rules that you know about this domain.
Here is the detector that I've fine-tuned, should you wish to replicate the results obtained from the first two alternatives.
We can delve deeper into each alternative based on your feedback on the performance of the videos I sent you, and explore ways to enhance your results. Please let me know your thoughts.
In your 3rd bulletpoint, you mentioned that using player positions can improve the result. How can we integrate the player information, specially if they move out of the frame. I am doing player reid for basketball. Often times the players go out of the frame and come back later (~30 seconds later). I am using clip_reid for reid purposes, but it still tags different players with same id. I was interested in incorporating the player positions as well. I would really appreciate any help.
Attached is the use case video with this issue
The situation we are working on We are working on a player-tracking solution for Ice Hockey. We are trying to record the statistics and analytics of a game match with Deep Learning. We will be recording metrics like, on a given frame who are all inside the rink (playing area), player position, and many others like who made a goal and how many goals the goalie (defender) defended. To track these, it is necessary to know which player stood where. Since the game is so fast-moving, it is impossible to recognize all players' jersey numbers in each frame. This is where tracking of each player helps to get the analytics better.
Need help with A video input will be given with a duration ranging between (10 - 40 seconds). We have a custom-trained Yolo model which will detect three classes, player, goalie, and referee. We need a tracking solution that should take care of moving cameras, and reidentification of the same players if they are out of frame for say 1 second. We would like to get suggestions on which model from norfair to go with (or which combination) and some inputs on the hyperparameters for the case of ice hockey.
Solutions Tried We have already tried the following from norfair,
Issues Faced
We tuned some of the parameters but were unable to get significant improvement. With ReID and Camera Motion, the player was not given the same id after getting into the frame after say 15 frames (we did try tuning the hit_counter_max initialization_delay parameters)
We would like to get some suggestions on model selection and tuning parameters
https://github.com/tryolabs/norfair/assets/102944639/59b814c6-c994-4087-87fd-ce7e61ecd334