ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.48k stars 16.29k forks source link

train.py in YOLOv5 no information is displayed, program executes with no error messages, but weights are not saved #12725

Closed artduurrr closed 7 months ago

artduurrr commented 8 months ago

Search before asking

YOLOv5 Component

Training

Bug

I am running the command:

!python train.py --img 256 --epochs 1 --batch-size 16 --data dataset.yml --weights yolov5n.pt

The command is able to execute and finish, but while it executes no information is displayed, and after it finishes no weights are saved unders runs/train/exp. There is no error message displayed either. Perhaps is there something wrong with the way I've organized my data?

Screenshot 2024-02-09 144836

Environment

-YOLO: YOLOv5 -Python 3.11.5 -OS: Windows

Minimal Reproducible Example

!python train.py --img 256 --epochs 1 --batch-size 16 --data dataset.yml --weights yolov5n.pt

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 8 months ago

👋 Hello @artduurrr, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 8 months ago

@artduurrr hello! Thanks for reaching out. If you're not seeing any output or saved weights after running train.py, it could be due to a few reasons:

  1. Data Configuration: Double-check your dataset.yml to ensure paths are correctly set and the dataset is properly formatted.
  2. Output Directory: By default, weights should be saved in runs/train/exp. If the directory doesn't exist or there's a permissions issue, it might fail silently.
  3. Verbose Output: Try adding the --verbose flag to your command to get more detailed output, which might help identify the issue.
  4. Environment: Ensure that your Python environment has all the necessary dependencies installed and up to date.

If you've checked these and the issue persists, please provide the verbose output or any additional information that could help diagnose the problem. Also, consider checking the Ultralytics Docs for more detailed guidance on troubleshooting training issues.

Keep up the great work, and thank you for being part of the YOLOv5 community! 😊🚀

artduurrr commented 8 months ago

I followed the training tutorial: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/.

  1. I double checked that my directories, and checked that my Python and PyTorch libraries are up to date.

  2. Here is my yml file, which I organized to match the above mentioned tutorial. image

  3. I'm actually operating in the root directory, and when I run the command that I listed above it actually gives me

python: can't open file 'C:\Users\duyng\OneDrive\Documents\Python_Scripts\yolov4_environment\ultralytics\train.py': [Errno 2] No such file or directory

  1. To fix that I added the relative paths to my command. !python yolov5/train.py --img 256 --epochs 3 --batch-size 16 --data yolov5/data/dataset.yml --weights yolov5n.pt

  2. I would like to add that when I run the train.py command, I the run/train/exp folder is being created but nothing actually appears in the weights folder.

  3. Thank you for your reply. I added the --verbose flag to my command. This is the result.

!python yolov5/train.py --img 256 --epochs 3 --batch-size 16 --data yolov5/data/dataset.yml --weights yolov5n.pt --verbose

WARNING:tensorflow:From c:\Users\duyng\anaconda3\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

usage: train.py [-h] [--weights WEIGHTS] [--cfg CFG] [--data DATA] [--hyp HYP] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--imgsz IMGSZ] [--rect] [--resume [RESUME]] [--nosave] [--noval] [--noautoanchor] [--noplots] [--evolve [EVOLVE]] [--evolve_population EVOLVE_POPULATION] [--resume_evolve RESUME_EVOLVE] [--bucket BUCKET] [--cache [CACHE]] [--image-weights] [--device DEVICE] [--multi-scale] [--single-cls] [--optimizer {SGD,Adam,AdamW}] [--sync-bn] [--workers WORKERS] [--project PROJECT] [--name NAME] [--exist-ok] [--quad] [--cos-lr] [--label-smoothing LABEL_SMOOTHING] [--patience PATIENCE] [--freeze FREEZE [FREEZE ...]] [--save-period SAVE_PERIOD] [--seed SEED] [--local_rank LOCAL_RANK] [--entity ENTITY] [--upload_dataset [UPLOAD_DATASET]] [--bbox_interval BBOX_INTERVAL] [--artifact_alias ARTIFACT_ALIAS] [--ndjson-console] [--ndjson-file] train.py: error: unrecognized arguments: --verbose

  1. I also went into the losses.py that the first message mentioned, there is no instance of tf.losses.sparse_softmax_cross_entropy, and tf.compat.v1.losses.sparse_softmax_cross_entropy is being used.
glenn-jocher commented 8 months ago

Hello @artduurrr, thanks for the detailed follow-up. Let's address the issues step by step:

  1. TensorFlow Warning: The warning you're seeing is from TensorFlow, not PyTorch or YOLOv5. It's possible you have TensorFlow installed and it's outputting a deprecation warning, but this should not affect YOLOv5 training.

  2. Unrecognized --verbose Argument: The --verbose flag is not a recognized argument in YOLOv5's train.py. My apologies for the confusion. You can remove this flag as it won't provide any additional output.

  3. File Not Found Error: The error message you received initially (python: can't open file...) indicates that the train.py script was not found at the specified path. It seems you've resolved this by adjusting the path.

  4. Empty runs/train/exp Directory: If the runs/train/exp directory is created but remains empty, it's possible that the training process is starting but not completing properly. This could be due to a variety of reasons, including incorrect dataset formatting, issues with the dataset paths in your YAML file, or other configuration problems.

  5. Next Steps: Since you're not getting any output from the training process, I recommend the following:

    • Ensure that your dataset is correctly labeled and organized according to the YOLOv5 documentation.
    • Verify that the paths in your dataset.yml are correct and accessible from the directory where you're running the train.py script.
    • Run the training command from the terminal (not a Jupyter notebook or other environment) to see if there's any difference in behavior.
    • If you're using a custom dataset, try running a quick training with the default YOLOv5 dataset to ensure the training pipeline works as expected.

If the problem persists, please provide the exact content of your dataset.yml (without using an image) and any output you receive when running the training command from the terminal. This will help us further diagnose the issue.

Thank you for your patience and for contributing to the YOLOv5 community! 🙌

artduurrr commented 8 months ago

Hi, just wanted to say thank you for all your responses, as you said I tried running on the command line. At first it gave me errors suggesting that certain libraries weren't installed. After navigating to the directory, I ran the pip install -r requirements.txt command, and reran the training command. This solved the issue and everything is outputting normally!

glenn-jocher commented 8 months ago

That's fantastic news, @artduurrr! I'm glad to hear that installing the required dependencies resolved the issue and that your training is now proceeding normally. Ensuring all requirements are met is often a crucial step in troubleshooting these kinds of problems.

If you have any more questions or run into further issues down the line, don't hesitate to reach out. Best of luck with your YOLOv5 projects, and happy training! 😊👍

Remember, the YOLOv5 community and the Ultralytics team are always here to help. Keep up the great work!

github-actions[bot] commented 7 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐