Closed malekaburaddaha closed 1 year ago
👋 Hello @malekaburaddaha, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!
Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.
Check out our YOLOv8 Docs for details and get started with:
pip install ultralytics
@malekaburaddaha hi there! It looks like the error message is indicating that the model is looking for a file or directory called 'D:\Malek' which does not exist. This suggests that there might be something wrong with the path you're using to store your files.
In order to help you better, could you provide more details about your setup such as the operating system, file paths, and any relevant terminal input/output?
Also, have you tried running the command from a different directory? You might want to give that a try and see if it resolves the issue.
Looking forward to hearing from you soon!
I changed the name of the path and made all the names of all the folders along the path as one word you can see the old path here:
and the New path is:
My yamil file is shown in the following screeshot:
My operating system is Windows 11 and I am using Python-3.7.15 with spyder
After running the training command after changning the names of the folders along the path, I do not see the same error any more, but now I have no results or errors as I see.... I got the following output when I run the command:
wandb: WARNING wandb is deprecated and will be removed in a future release. See supported integrations at https://github.com/ultralytics/yolov5#integrations. train: weights=yolov5s.pt, cfg=, data=pothole.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=100, batch_size=12, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: up to date with https://github.com/ultralytics/yolov5 YOLOv5 v7.0-169-geef637c Python-3.7.15 torch-1.13.1+cpu CPU
Can you see where is the issue now? I really appreciate your help.
hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/
@malekaburaddaha hi there,
I'm glad to hear that you were able to solve the previous error by changing the folder names along the path. It's always best to avoid using spaces in file paths, so this was a great change.
Regarding your new output, it seems that your training command is running without any errors, but you are not seeing any results. One thing you can try is to increase the number of epochs or decrease the batch size to see if your model starts producing meaningful results.
Also, I noticed that you are using WandB, which is a deprecated library that will soon be removed from YOLOv5. You might want to consider using a different library for tracking your training progress such as TensorBoard, which is already included in YOLOv5.
Finally, if you are still having issues or not seeing meaningful results, please provide more information such as any training logs, sample images, or relevant console output. This will help us identify and solve any issues you may be encountering and get your model working as expected.
Best regards.
I reduced the batch number, and increased the epochs, and did that several times. Deleted the repo and cloned it again, and I still have the same results which is no results.
I even deleted cudatoolkit, and numba packages which I downloaded after training the model for a low number of epochs before, but I do not have the results since I have deleted the repo and cloned it again because I did not need them due to the low number of epochs used at the beginning.
How to change wandb to Tensorboard? I could not find where it is mentioned in the repo code files I am trying now to uninstall wandb and run the code to see what would I get.
I am using images from a dashcam for potholes as the those:
@malekaburaddaha hello,
I'm sorry to hear that you're still encountering issues with your YOLOv5 implementation. Since you've tried adjusting the batch size and increasing the number of epochs without any success, it's possible that the issue may lie elsewhere.
Regarding your question about switching from WandB to TensorBoard, there should be an option to do so in the training script. You can search for any references to WandB in your code and comment those out, then uncomment any references to TensorBoard. Additionally, you can refer to the YOLOv5 documentation for more information on using TensorBoard for tracking your training progress.
Deleting and re-cloning the repository may not solve your issue as it could be related to other factors such as your data or configuration. It's worth checking the training logs for any warnings or errors that might give you insights into the problem.
Regarding the images you're using for pothole detection, they look good, but it's difficult to know for sure if they satisfy your specific use case or if there are any issues with the data without further analysis.
I hope this helps. Please let us know if you need further assistance.
Mostly there is something missed up with the environment that I am using, because I ran the code on colab with the same data and same batch number and epochs and it ran smoothly without errors. I think that I will be creating a new environment and try to train with that.
Moreover, I looked for wandb and tensorboard in train.py and the other files, and non of that is mentioned there.
Now, I created a new environment and downloaded all the libraries and packages, and now I ran the code and I am getting what I used to get before the error that I asked about is this thread, which is the following:
The code is running without printing anything, and when it is done, it will only print the last two or one epoch only, and it won't print the first few lines of the results that shows all the paramteres and features of the model, I will take a screenshot of that when it is done training for 100 epochs and show you.
Thank you for being patient with me for this far.
@malekaburaddaha hi there,
Thank you for the update! It's great to hear that you were able to solve the issue by creating a new environment and downloading all of the necessary libraries and packages.
Regarding the output you are seeing, it is possible that the code is only printing the last few epochs because the verbosity of the output has been set to a lower level. You can try increasing the verbosity level in the training script to get more detailed output. Alternatively, you can create your own script to print the output you want at specific intervals during training.
Please let us know if you have any further questions or issues. We are always happy to help!
Best regards.
Yeah, based on the training code everything should be printed out, and I tested that on colab as well, but the issue with colab is that it cannot stay connected for a long time, that's why I need to do it on my machine.
One last thing you might help me with if is possible, I am using Spyder IDE, how to train using my GPU instead of the CPU on my machine? I searched some steps on google but nothing is accurate.
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐
@malekaburaddaha to train using your GPU in the Spyder IDE, you can achieve that by ensuring that your GPU is properly set up and then configuring your training script to use it.
First, ensure that you have the appropriate GPU drivers and CUDA toolkit installed.
Next, within your training script, you can set the device to use the GPU by specifying 'cuda' as the device. For example, you can add the following line of code at the beginning of your script:
import torch
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f"Training on {device}")
This will automatically select the GPU if it's available and use it for training. If you encounter any specific issues with setting up your GPU for training, feel free to refer to the official documentation for your GPU and the libraries you are using for more detailed instructions.
I hope this helps! Please let me know if you have any other questions.
Hello,
I am trying to run the training, and I already did for a small number of epochs on my machine cpu. At the end I wanted to run the training for 100 epochs but using GPU on my machine. All what I did at that moment is that I downloaded two libraries in order to try to use the GPU with spyder. I downloaded: "Cudatoolkit", and "Numba" libraries using Anaconda navigator environment. then I tried to run the code without caching and I only got the following:
wandb: WARNING wandb is deprecated and will be removed in a future release. See supported integrations at https://github.com/ultralytics/yolov5#integrations. train: weights=yolov5s.pt, cfg=, data=pothole.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=100, batch_size=12, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs\train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: up to date with https://github.com/ultralytics/yolov5 fatal: cannot change to 'D:\Malek': No such file or directory YOLOv5 2023-5-21 Python-3.7.15 torch-1.13.1+cpu CPU
hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 TensorBoard: Start with 'tensorboard --logdir runs\train', view at http://localhost:6006/
I bold the line where I think causes the error but I cannot understand what does that mean? I deleted the repo and cloned again to the same folder, but it gives the same results and does not start the training. Can you help please!!