Closed KristofferK closed 3 years ago
π Hello @KristofferK, thank you for your interest in YOLOv5 π! Please visit our βοΈ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a π Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training β Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.
Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
Public W&B Logging link: https://wandb.ai/kknuds19/train/runs/o18wqty1/overview?workspace=user-kknuds19
@KristofferK it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt
again. We also highly recommend using one of our verified environments below.
Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@glenn-jocher It is a freshly setup Anaconda envrionment, with latest repo and requirements.txt. PyTorch is 1.8.2 (LTS).
@KristofferK unfortunately we don't have resources to help debug individual environments. If I were you I would create a venv and pip install everything, we don't use conda in our verified environments.
@KristofferK also for us to begin investigating an issue we need a minimum reproducible example. If we can't reproduce your issue there's no action for us to take. We've created a few short guidelines below to help users provide what we need in order to get started investigating a possible problem.
When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
In addition to the above requirements, for Ultralytics to provide assistance your code should be:
git pull
or git clone
a new copy to ensure your problem has not already been resolved by previous commits.If you believe your problem meets all of the above criteria, please close this issue and raise a new one using the π Bug Report template and providing a minimum reproducible example to help us better understand and diagnose your problem.
Thank you! π
@glenn-jocher Facing the same issue, when running on a Windows machine with a newly setup environment with all the dependencies installed correctly.
It seems like plot_labels
for some reason kills the entire process. Commenting out the line of code below from train.py
lead to a normal training, without any errors.
https://github.com/ultralytics/yolov5/blob/5d4258fac5e6ceaa9c897f841cb737c56717a996/train.py#L235
To confirm, I also executed plot_labels
in isolation using a manually created data loader, and it ended up killing the process as well. Moreover, to make sure that it's not a memory issue, I was using just 5 images for testing.
EDIT: It seems like there was a bug recently introduced in a package called freetype
. Found some mentions here:
It's only affecting windows machines.
@KristofferK Downgrading freetype
to 2.10.4 fixed the issue.
@MrinalJain17 Thank you so much. That did indeed fix the issue. I hope yolov5 will either wrap the plot_labels in a Try/Except or force the version of the freetype package. I downgraded from 2.11.0 to 2.10.4, and it works again.
@MrinalJain17 thanks for looking into this! It seems like there is no action for us to take then based upon your conclusions?
We can try: except label plotting also, but I'm not sure it's best practices for downstream matplotlib users to all adjust their code for error handling here.
On MacOS I don't see any freetype package here either. This is what my environment looks like based upon pip install -r requirements.txt
(venv) (base) glennjocher@Glenns-iMac yolov5 % pip list
Package Version
----------------------- ---------------------
absl-py 0.15.0
appnope 0.1.2
backcall 0.2.0
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.7
cycler 0.10.0
decorator 5.1.0
google-auth 2.3.0
google-auth-oauthlib 0.4.6
grpcio 1.41.0
idna 3.3
ipython 7.28.0
jedi 0.18.0
kiwisolver 1.3.2
Markdown 3.3.4
matplotlib 3.4.3
matplotlib-inline 0.1.3
numpy 1.21.3
oauthlib 3.1.1
opencv-python 4.5.4.58
pandas 1.3.4
parso 0.8.2
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.4.0
pip 21.3.1
prompt-toolkit 3.0.21
protobuf 3.19.0
ptyprocess 0.7.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
Pygments 2.10.0
pyparsing 2.4.7
python-dateutil 2.8.2
pytz 2021.3
PyYAML 6.0
requests 2.26.0
requests-oauthlib 1.3.0
rsa 4.7.2
scipy 1.7.1
seaborn 0.11.2
setuptools 57.0.0
six 1.16.0
tensorboard 2.7.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
thop 0.0.31.post2005241907
torch 1.10.0
torchvision 0.11.1
tqdm 4.62.3
traitlets 5.1.0
typing-extensions 3.10.0.2
urllib3 1.26.7
wcwidth 0.2.5
Werkzeug 2.0.2
wheel 0.36.2
@MrinalJain17 thanks for looking into this! It seems like there is no action for us to take then based upon your conclusions?
We can try: except label plotting also, but I'm not sure it's best practices for downstream matplotlib users to all adjust their code for error handling here.
@glenn-jocher That makes sense. It's windows-specific, and hopefully a temporary issue.
However, I believe it would be helpful to have some sort of a "known issues" tracker for the YOLOv5 repository, which would describe any such errors along with some troubleshooting options. Even in the future, if some other third-party library breaks any part of the code, users can find that info (and relevant solutions) in the said tracker.
@MrinalJain17 yes a known issue tracker is certainly a good idea. We have a TODO list with about 20 items which somewhat handles this currently. We track these these through issue tags: https://github.com/ultralytics/yolov5/issues?q=is%3Aissue+label%3ATODO+
@MrinalJain17 seems like another Windows user had the same problem in #5611. I just realized another option besides try except is to use or utils.general.timeout. Maybe something like this:
@Timeout(30)
def plot_labels(labels, names=(), save_dir=Path('')):
# plot dataset labels
...
@MrinalJain17 wait I just noticed a difference. In #5611 the process just hangs at plot_labels(), but you said in your case the process actually terminated by itself?
@MrinalJain17 @KristofferK good news π! Your original issue may now be fixed β
in PR #5616. This PR does not fix any underlying issues with matplotlib/freetype, but it does enclose plot_labels() in try: except
and Timeout
decorators to bypass it in case of issues. This means no label plots will be produced if errors/hangs are encountered, but training will proceed normally without issue.
To receive this update:
git pull
from within your yolov5/
directory or git clone https://github.com/ultralytics/yolov5
againmodel = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
sudo docker pull ultralytics/yolov5:latest
to update your image Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 π!
@glenn-jocher So, if you notice this section below of the output from #5611 , this is actually what the issue was. Basically, anything remotely close to a matplotlib command ended up killing the entire process.
The Timeout()
approach should be quite helpful in the future, if something unexpectedly breaks (but hopefully not).
Moreover, the issue was super-specific: It was for windows-machines using anaconda with the default channel. The good news is that they've yanked freetype
officially: https://github.com/AnacondaRecipes/repodata-hotfixes/pull/150
Hi @KristofferK ! I see you are working on malaria detection. What kind of images are you working with( thick or thin smear). I have the same project and I want you to help me if possible. Thanks!
Hi @KristofferK ! I see you are working on malaria detection. What kind of images are you working with( thick or thin smear). I have the same project and I want you to help me if possible. Thanks!
Hello WJos. The malaria dataset is not actually what I am working on, rather it was to test out yolov5 before using it on my own dataset of drosophila. For malaria I used https://www.kaggle.com/kmader/malaria-bounding-boxes/ and converted it to yolov5 format. I might still have the code for the converter if you're interested.
@MrinalJain17 seems like another Windows user had the same problem in #5611. I just realized another option besides try except is to use or utils.general.timeout. Maybe something like this:
@Timeout(30) def plot_labels(labels, names=(), save_dir=Path('')): # plot dataset labels ...
It doesn't work well on Windows, because there is a 'signal.SIGALRM' in class 'Timeout'. It would thourgh a error like "module 'signal' has no attribute 'SIGALRM'. But it work well on Linux. How about remove Timeout(30) but still keep try_except?
@yeshanliu it appears you may have environment problems. The above code does work well on windows, windows is part of our daily CI testing:
https://github.com/ultralytics/yolov5/runs/5562838761?check_suite_focus=true
Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.9 environment, clone the latest repo (code changes daily), and pip install
requirements.txt again from scratch.
π‘ ProTip! Try one of our verified environments below if you are having trouble with your local environment.
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
Models and datasets download automatically from the latest YOLOv5 release when first requested.
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@yeshanliu I investigated some more, it looks like the Windows CI tests are passing because the Try Except decorator is outside the Timeout decorator and is catching the SIGALARM error. So the good news is it works on Windows if you are using current code, the bad news is it works by skipping plotting labels. I think the solution is to put if else statements into Timeout and just put a note that it doesn't work on windows. I'll create a PR.
That will be so good! And thanks for applying.
ε¨ 2022εΉ΄3ζ16ζ₯οΌ21:20οΌGlenn Jocher @.***> ειοΌ
ο»Ώ @yeshanliu I investigated some more, it looks like the Windows CI tests are passing because the Try Except decorator is outside the Timeout decorator and is catching the SIGALARM error. So the good news is it works on Windows if you are using current code, the bad news is it works by skipping plotting labels. I think the solution is to put if else statements into Timeout and just put a note that it doesn't work on windows. I'll create a PR.
β Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
@KristofferK @MrinalJain17 @WJos @yeshanliu good news π! Your original issue may now be fixed β in PR #7013. This PR disables Timout using SIGALARM on Windows. To receive this update:
git pull
from within your yolov5/
directory or git clone https://github.com/ultralytics/yolov5
againmodel = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
sudo docker pull ultralytics/yolov5:latest
to update your image Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 π!
i deleted --cache and problem solved
I met the same problem,it seems that TryExcept doesnot work.I sure that all thrid-party packages were well installed but it still terminating after "Plotting labels".My OS is Ubuntu 18.04
I met the same problem,it seems that TryExcept doesnot work.I sure that all thrid-party packages were well installed but it still terminating after "Plotting labels".My OS is Ubuntu 18.04
This issue is about "Plotting labels" terminating on Windows platform. The release (7.0 and 6.2) work well on my Ubuntu platform, so I suggest you check your release version and environment
hi @yeshanliu I'd recommend checking if your release versions are updated and if your environment meets all the necessary requirements. Make sure to use the latest release of the YOLOv5 repository and a complete installation of all required packages. You can refer to the installation instructions in the Ultralytics YOLOv5 documentation for a complete guide on setting up YOLOv5 on your Ubuntu 18.04 system. If the issue still persists, feel free to provide more details about your setup, and we can further investigate the problem.
While I am able to use YOLOv5 for inference, the train.py does not seem to work for me anymore. It did work previously however.
I have tried to clone the latest repo as well. I have set up a fresh Conda environment with Python 3.8. Again, inference works, but not training my custom data.
It will create the "exp" directory (exp24) in this case. Which contains an empty "weights" directory, hyp.yaml, opt.aml, and events.out.fs.events..0. No .pt, no images, no results.csv.
I have tried both the training set that I previously was able to train with and a new one I just created.
I run it using
python train.py --img 640 --batch 4 --epochs 200 --data C:/Users/kristofferk/Documents/GitHub/p9-api/experiment/kristoffer/step06-data.yaml --weights yolov5s.pt
But when it comes to "Plotting labels..." it will be stuck there for about 20 seconds and then terminate without any further warnings or errors.
The output of running train.py is:
Any suggestions on how to proceed from here? Either to fix it or at least get a more detailed error message.
Thanks in advance.