Closed paolocomis closed 1 year ago
👋 Hello @paolocomis, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@paolocomis P6 models like YOLOvl6 are intended for large objects. If you're mainly looking for tennis balls you'd probably do better with a normal (i.e. P5) model.
In any case it's normal that if you ask the model to do twice as many classes the performance on each class will reduce. Otherwise if I follow your train of thought as your number of classes trends towards infinity you'd expect the same exact performance for each class?
Thanks for the reply. I figured that adding just one class (player) wouldn't affect ball recognition performance. So for each added class the detection performance of the other classes decreases?
On the other hand, regarding the choice of model, you say that I should focus on a P5. Is it better then to train the model for example with yolov5l passing images at 1280? The ball is very small, often a few pixels, on images labeled with 1920x1080 resolution.
@paolocomis if the ball is only a few pixels I'd just use a regular 5l model. Each new task means that there are a finite amount of resources dedicated to accomplishing more things, which logically means that each thing will be accomplished slightly worse. The same is true of human brains or any system with finite resources.
Ok thanks for the advice, I'll try with yolov5l. But instead of finding a compromise, isn't there a way to get the most out of 2 object detection? After all they are only 2 objects, I do not understand this big difference with the addition of only 1 class.
If you decrease the number of player labels would it help?
@paolocomis small objects will get better results at larger --imgsz but you should experiment.
Thanks again for your answers, always exhaustive. I have one last question, perhaps the most important, which is always linked to the initial question:
Is it possible that the model fits better to the class with more instances? As described in the initial question, the number of instances of "players" is more than double that of "balls".
It could be the reason why after adding the "players" class the performance of detecting "balls" has dropped significantly.
So if I reversed the relationship, with twice as many "balls" annotations as "players", would I get closer to the performance of the "balls only" recognition model?
@paolocomis yes, class frequency will relate to the results on the class, with highly represented classes performing better than lower-represented classes. No dataset is perfectly balanced, and YOLO can handle significant class imbalance (i.e. 10 to 1 or worse), but you'll get better results if your classes are similar in quantity.
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
Ok thanks for the advice, I'll try with yolov5l. But instead of finding a compromise, isn't there a way to get the most out of 2 object detection? After all they are only 2 objects, I do not understand this big difference with the addition of only 1 class.
If you decrease the number of player labels would it help?
I am doing a similar task which is the ball and player detection. I have done experiments on YOLOv5x6 and YOLOv5(v6.0). The results illustrated that YOLOv5(v6.0) have better performance. Currently, I am dealing with instances of imbalance between player and ball. The number of instances of the box for the player is sixteen times larger than the number of instances of the ball in my dataset. I am trying to delete part of the label for players to make further analysis. Do you have any suggestions on instances of imbalance between classes?
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
@lpc-eol That's an interesting experiment, and it's good to hear that you're testing the effects of class imbalance. It's common that class imbalance can affect model performance, particularly when one class is heavily overrepresented compared to another. In your case, reducing the number of player labels to balance the class distribution is a valid approach to consider. I'd be interested to hear about your findings once you complete your analysis. Good luck!
I've searched everywhere but haven't found a solution to my problem.
I trained a yolov5l6 model for the recognition of a very small tennis ball from a data set of about 16k labeled images (each image presents 1 or 2 balls) at 1280x720 resolution, obtaining a satisfactory recognition result, both when the ball is large enough (close to the camera) and when it is very small (away from the camera).
So I decided to also add player recognition by labeling the same images that now show 3 or 4 players and 1 or 2 tennis balls, with the exact same settings of the first training session. The end result is that player recognition is great, while the ball is now not detected as accurately as before, i.e. it is now often not detected when it is small enough (far from the camera).
I carried out the tests with the exact same videos and I realized that the confidence value of the ball has also dropped. If at frame X the ball confidence value was 0.87 with the new model the value dropped to 0.80.
I suspect it depends on the anchor settings. In fact, I noticed that if I train the model with only the annotations of the ball during the start-up phase, an unsuitable set of anchors is detected and therefore automatically fixed. When I throw the balls and players training on the other hand the anchor setup is judged ok.
Is it perhaps because there are more "players" objects than "balls" in the annotations and therefore the average player size is prioritized?
The training command I run on Colab is as follows:
python train.py --cache ram --epochs 350 --img 1280 --rect --batch 24 --worker 24 --weights yolov5l6.pt --cfg models/yolov5l6_custom.yaml --hyp data/hyps/hyp. scratch-high.yaml --data data/custom.yaml --name test --exist-ok
Should I perhaps adapt the training with a different setting than the standard one to solve the problem?
Thanks in advance