Closed karndeepsingh closed 3 years ago
π Hello @karndeepsingh, thank you for your interest in π YOLOv5! Please visit our βοΈ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a π Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training β Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7
. To install run:
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@karndeepsingh if you use the --nosave flag or the --notest flag then yes only last.pt will be saved, this is the intended behavior.
Oh ! I am using that flag! Just help me with one more thing, this best.pt file stores best-trained weights right?
Can you share some link or resource so that I can deploy this trained model and what are files I would be expecting from the yolov5 folder to consider it for production?
@karndeepsingh see Export and other tutorials below:
@karndeepsingh if you use the --nosave flag or the --notest flag then yes only last.pt will be saved, t
@karndeepsingh see Export and other tutorials below:
YOLOv5 Tutorials
- Train Custom DataΒ π RECOMMENDED
- Tips for Best Training ResultsΒ βοΈ RECOMMENDED
- Weights & Biases LoggingΒ π NEW
- Supervisely EcosystemΒ π NEW
- Multi-GPU Training
- PyTorch HubΒ β NEW
- TorchScript, ONNX, CoreML Export π
- Test-Time Augmentation (TTA)
- Model Ensembling
- Model Pruning/Sparsity
- Hyperparameter Evolution
- Transfer Learning with Frozen LayersΒ β NEW
- TensorRT Deployment
Thankyou so much! We can also load our custom trained model using torch.hub.load() function right? So, this can be used directly in production i guess. Correct me if I am wrong.
One more thing wanna add on this, I am training on multiple GPUs using command: !python -m torch.distributed.launch --nproc_per_node 2 train.py --data coco128.yaml --batch_size 4 --weights yolo5x6.pt
Training get initiated and script starts running and it get stuck after printing details of 1st epoch but script keep on running and no status after 1st epoch. Can help something on this ?
@karndeepsingh yes, see PyTorch Hub tutorial for details: https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading
Regarding your bug question, we've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
In addition to the above requirements, for Ultralytics to provide assistance your code should be:
git pull
or git clone
a new copy to ensure your problem has not already been resolved by previous commits.If you believe your problem meets all of the above criteria, please close this issue and raise a new one using the π Bug Report template and providing a minimum reproducible example to help us better understand and diagnose your problem.
Thank you! π
@karndeepsingh yes, see PyTorch Hub tutorial for details: #36
Regarding your bug question, we've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.
How to create a Minimal, Reproducible Example
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
- β Minimal β Use as little code as possible that still produces the same problem
- β Complete β Provide all parts someone else needs to reproduce your problem in the question itself
- β Reproducible β Test the code you're about to provide to make sure it reproduces the problem
In addition to the above requirements, for Ultralytics to provide assistance your code should be:
- β Current β Verify that your code is up-to-date with current GitHub master, and if necessary
git pull
orgit clone
a new copy to ensure your problem has not already been resolved by previous commits.- β Unmodified β Your problem must be reproducible without any modifications to the codebase in this repository. Ultralytics does not provide support for custom code β οΈ.
If you believe your problem meets all of the above criteria, please close this issue and raise a new one using the π Bug Report template and providing a minimum reproducible example to help us better understand and diagnose your problem.
Thank you! π
Sure, I will take care of these!
I went to these Pytorch tutorials for inferencing, how we can crop the detected classes from the image using this, like how we generally pass --save-crop flag in detect.py file.
@karndeepsingh
results = model(imgs)
results.crop()
@karndeepsingh
results = model(imgs) results.crop()
Awesome!! Thankyou so much for this help! Highly Appreciated!
@karndeepsingh logging location is indicated before and after training. ALL training results are logged to this directory.
I have two questions:
Your suggestion would be helpful!!
See labels.png generated on training start.
For cloud environments see https://pytorch.org/get-started/cloud-partners/
See labels.png generated on training start.
For cloud environments see https://pytorch.org/get-started/cloud-partners/
Thanks!
Hello! How data augmentation is taken care of in YoloV5? Just curious to understand.
@karndeepsingh π Hello! Thanks for asking about image augmentation. YOLOv5 π applies online imagespace and colorspace augmentations in the trainloader (but not the testloader) to present a new and unique augmented Mosaic (original image + 3 random images) each time an image is loaded for training. Images are never presented twice in the same way.
The hyperparameters used to define these augmentations are in your hyperparameter file (default data/hyp.scratch.yaml
) defined when training:
python train.py --hyp hyp.scratch.yaml
You can view the effect of your augmentation policy in your train_batch*.jpg images once training starts. These images will be in your train logging directory, typically yolov5/runs/train/exp
:
train_batch0.jpg
shows train batch 0 mosaics and labels:
Good luck and let us know if you have any other questions!
@glenn-jocher Thanks for the detailed information. So, augumentation are applied automatically or we need to specifically mention this hyperparmeter file while training ?
@karndeepsingh see train.py argparser for hyp.yaml argument: https://github.com/ultralytics/yolov5/blob/7d3686a686478c78beb2b32cf8a35c1a5dbe81b8/train.py#L452-L489
Hello, I have trained a model and set a specific threshold such as 0.6 and is able to show prediction on the images with bounding boxes with confidence more than a threshold value. But I want to save images that the model has predicted with low confidence level i.e below the mentioned threshold. Any suggestion how I can achieve this?
Reson for asking this is because I want to do Active Learning to annotate my large dataset. Any help on ACTIVE LEARNING with YOLOV5 would be good !
Thankyou
π Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 π resources:
Access additional Ultralytics β‘ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 π and Vision AI β!
I am getting the same issue while running for more than 5 epoc .. the best.pt file is not getting generated . could you help me what should I change so that I should get best.pt file for more than 40 epoc
@priyabratknoldus best.pt is saved every best epoch automatically.
@karndeepsingh if you use the --nosave flag or the --notest flag then yes only last.pt will be saved, this is the intended behavior.
but where can we change this code.. on which file ? may I know
@priyabratknoldus best.pt is saved every best epoch automatically.
but when I am giving epoc 5 it is saving but for epoc more than 10 the best.pt is not saving
and one more question if best.pt is not saving then i think we would not be able to predict a new image i believe
@priyabratknoldus best.pt is saved on every new best epoch. If you use --nosave or --noval then best.pt will not be saved naturally.
οΌ glen-jocher how to remove --nosave flag?
@joynjo I don't understand your question. --nosave is a flag you can choose to use with training. It's off by default.
Hello! I have a similar problem. The best.pt is not saved to the folder it is supposed to be saved, there is only last.pt. I resumed training using the weights saved in .../feature_extraction14 folder, and the best results have occured after resuming the training (it was resumed at epoch 95, and the best results have occured at epoch 134), and were saved to the .../feature_extraction15 folder. nosave flag is set to False. These are the parametersI used : train: weights=/content/drive/My Drive/microplasticos/microplasticos_576/feature_extraction14/weights/last.pt, cfg=, data=/content/drive/My Drive/microplasticos/microplasticos_576.yaml, hyp=../../../../drive/My Drive/microplasticos/yolov5-master/data/hyps/hyp.scratch-low.yaml, epochs=250, batch_size=14, imgsz=576, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=/content/drive/My Drive/microplasticos/microplasticos_576, name=feature_extraction, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=40, freeze=[12], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
@emailic so you're saying that saving proceeds normally before resuming, after resuming best.pt is no longer saved? How do you know this? Your /weights directory should have both files before and after resuming, so where did your best.pt go then after resuming?
Hi @glenn-jocher ! Thanks for your reply. So before resuming, the last.pt got saved to .../feature_extraction14, which is also where the best.pt is. However, if I'm not mistaken, this is the best.pt relevant to the .../feature_extraction14, which never reached the end(stopped at epoch 95). When I resumed (and finished) the training, the new last.pt got saved to .../feature_extraction15 folder, which is where I also expected the best.pt to be located at, but it's not there.
@emailic --resume resumes to the same exact directory, it does not create a new directory.
Hi @glenn-jocher , thanks for getting back to me. I actually had some problems resuming the training with --resume, so I resumed it by inserting the last weights obtained(feature_extraction14) in the --weights flag. The training indeed resumed (started iterating from the 95th epoch), and at the end of training you can see the that it's written that the results are saved to feature_extraction15. However, in that folder i can only find the last.pt, can't seem to find best.pt
@emailic thanks for the info. Is this reproducible? i.e. if you CTRL+C in the middle of training and then --resume from the specific last.pt do you again see a new directory created?
python train.py --epochs 10 # CTRL+C
python train.py --resume runs/train/exp/weights/last.pt
Hi @glenn-jocher , sorry for the delay. If this is not urgent, I will get back to you in a while, really busy with a project now. Take care
@emailic no worries! Whenever you have the time, feel free to get back to me. Good luck with your project!
βQuestion
I have been training the yolov5 for my custom dataset but it is unable to save best.pt checkpoint. I trained it almost 3 times, thinking that it is an issue with the notebook. Please, help me to save the best-trained weights. Only last.pt file is getting saved after every training.
And please enlighten your thoughts on best.pt file like what it is? Is it the best-trained weight file or anything else?
Thank you, Karndeep Singh
Hello Could you explain how you solved this case,please. I have the same problem now.
@khinnnnn hello! The best.pt
file is indeed intended to represent the model weights that achieved the best performance on the validation set during training, according to the metrics being monitored (e.g., mAP). If you're only seeing the last.pt
file, it could be due to a few reasons:
Validation Set: Ensure you have a validation set defined in your dataset. The best.pt
is determined based on performance on this set. If there's no validation set, the concept of "best" doesn't apply.
Training Configuration: Check your training command and configuration files to ensure they're set up correctly for saving checkpoints beyond just the last one.
Patience Parameter: If you're using early stopping (via the patience
parameter in some configurations), ensure it's not set too low, which might be stopping training before significant improvements are seen.
File System Issues: Ensure there's enough disk space and you have the necessary write permissions in the directory where the training outputs are being saved.
Manual Resumption: If you manually resumed training by specifying --weights
with the last checkpoint, ensure that the training indeed picks up correctly and that the directory structure for saving checkpoints hasn't been altered unintentionally.
If you're following the standard training procedure without modifications and still facing issues, it might be helpful to share more details about your training command, dataset configuration, and any modifications you've made to the training script or environment. This can provide more context for troubleshooting.
Remember, the key to resolving this is ensuring your validation set is correctly set up and monitored during training, and that your training environment is correctly configured for saving checkpoints.
To save images with low confidence predictions, you can modify the detection script to include a lower threshold for saving. For active learning, consider using these low-confidence predictions to identify and annotate uncertain samples. You might also explore integrating with tools like Roboflow for active learning workflows.
To save images with low confidence predictions, you can modify the detection script to include a lower threshold for saving images. For active learning, consider using these low-confidence predictions to identify samples for further annotation. You might find integrating a custom script to automate this process helpful.
βQuestion
I have been training the yolov5 for my custom dataset but it is unable to save best.pt checkpoint. I trained it almost 3 times, thinking that it is an issue with the notebook. Please, help me to save the best-trained weights. Only last.pt file is getting saved after every training.
And please enlighten your thoughts on best.pt file like what it is? Is it the best-trained weight file or anything else?
Thank you, Karndeep Singh