ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.55k stars 16.3k forks source link

Yolov5 training stops in Google Colab after certain number of epochs due to an error #FileNotFoundError in DataLoader worker process 4. #FileNotFoundError: [Errno 2] No such file or directory: #12370

Closed marco0913 closed 8 months ago

marco0913 commented 11 months ago

Search before asking

YOLOv5 Component

No response

Bug

Epoch GPU_mem box_loss obj_loss cls_loss Instances Size 81/299 14.1G 0.01582 0.006917 0.006883 31 640: 100% 158/158 [02:21<00:00, 1.11it/s] Class Images Instances P R mAP50 mAP50-95: 100% 18/18 [00:14<00:00, 1.27it/s] all 852 970 0.958 0.943 0.976 0.866

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
 82/299      14.1G    0.01579   0.006871   0.006856         38        640: 100% 158/158 [02:21<00:00,  1.11it/s]
             Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 18/18 [00:13<00:00,  1.29it/s]
               all        852        970      0.957      0.955      0.982      0.874

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
 83/299      14.1G    0.01547   0.006779    0.00638         31        640: 100% 158/158 [02:21<00:00,  1.11it/s]
             Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 18/18 [00:13<00:00,  1.30it/s]
               all        852        970      0.956      0.955      0.978       0.86

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
 84/299      14.1G    0.01537   0.006777   0.006099         25        640: 100% 158/158 [02:21<00:00,  1.11it/s]
             Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 18/18 [00:14<00:00,  1.27it/s]
               all        852        970      0.967      0.939      0.982      0.876

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
 85/299      14.1G    0.01547   0.006634   0.006649         34        640: 100% 158/158 [02:22<00:00,  1.11it/s]
             Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 18/18 [00:14<00:00,  1.28it/s]
               all        852        970       0.95      0.955      0.978      0.871

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
 86/299      14.1G    0.01538   0.006602   0.006286         22        640: 100% 158/158 [02:22<00:00,  1.11it/s]
             Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 18/18 [00:14<00:00,  1.28it/s]
               all        852        970      0.966       0.94      0.979      0.863

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
 87/299      14.1G    0.01532   0.006722   0.006537         59        640:  47% 74/158 [01:06<01:15,  1.11it/s]

Traceback (most recent call last): File "/content/yolov5/train.py", line 647, in main(opt) File "/content/yolov5/train.py", line 536, in main train(opt.hyp, opt, device, callbacks) File "/content/yolov5/train.py", line 291, in train for i, (imgs, targets, paths, _) in pbar: # batch ------------------------------------------------------------- File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/content/yolov5/utils/dataloaders.py", line 172, in iter yield next(self.iterator) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 630, in next data = self._next_data() File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 694, in reraise raise exception FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 4. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/yolov5/utils/dataloaders.py", line 661, in getitem img, labels = self.load_mosaic(index) File "/content/yolov5/utils/dataloaders.py", line 760, in loadmosaic img, , (h, w) = self.load_image(index) File "/content/yolov5/utils/dataloaders.py", line 735, in load_image im = cv2.imread(f) # BGR File "/content/yolov5/utils/general.py", line 1100, in imread return cv2.imdecode(np.fromfile(filename, np.uint8), flags) FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/dataset_yolo5_m3_2ndTraining/images/train/Cam1_27.10.2022.14.52.41.png'

but the file is definetly there..

Environment

Google Colab

No response

Minimal Reproducible Example

!python /content/yolov5/train.py --data /content/drive/MyDrive/dataset_yolov5/config.yaml --epochs 300 --weights '' --cfg yolov5l.yaml --batch-size 24

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 11 months ago

👋 Hello @marco0913, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 11 months ago

Hi @marco0913, it looks like your training process is encountering a FileNotFoundError for the file /content/drive/MyDrive/dataset_yolo5_m3_2ndTraining/images/train/Cam1_27.10.2022.14.52.41.png.

Please ensure the file path is correct and that the file exists in the specified location. Also, consider checking for any discrepancies in the configuration file that might be causing the issue.

Let me know if you need further assistance!

marco0913 commented 11 months ago

Hi @marco0913, it looks like your training process is encountering a FileNotFoundError for the file /content/drive/MyDrive/dataset_yolo5_m3_2ndTraining/images/train/Cam1_27.10.2022.14.52.41.png.

Please ensure the file path is correct and that the file exists in the specified location. Also, consider checking for any discrepancies in the configuration file that might be causing the issue.

Let me know if you need further assistance!

hi Glenn thank you for your response. the path is definetly correct. i remember i encountered this error with yolov8m but solved the issue by just employing yolov8s

glenn-jocher commented 11 months ago

@marco0913 you're welcome! I'm glad to hear you found a workaround with YOLOv8s. YOLOv5 is built on continuous community contributions and improvements, so I appreciate your patience with the issues you encountered. If you have any more questions or need further assistance, feel free to ask!

marco0913 commented 11 months ago

@marco0913 you're welcome! I'm glad to hear you found a workaround with YOLOv8s. YOLOv5 is built on continuous community contributions and improvements, so I appreciate your patience with the issues you encountered. If you have any more questions or need further assistance, feel free to ask!

I didn't really solve the issue as bigger size of yolov is giving me better accuracy. I tried now to train with yolov5m instead of yolov5l. I got the same error but at Epoch 120 and with a different image

  Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
120/299      14.1G    0.01412   0.006402   0.005072         72        640:  76% 72/95 [01:11<00:22,  1.00it/s]

Traceback (most recent call last): File "/content/yolov5/train.py", line 647, in main(opt) File "/content/yolov5/train.py", line 536, in main train(opt.hyp, opt, device, callbacks) File "/content/yolov5/train.py", line 291, in train for i, (imgs, targets, paths, _) in pbar: # batch ------------------------------------------------------------- File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/content/yolov5/utils/dataloaders.py", line 172, in iter yield next(self.iterator) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 630, in next data = self._next_data() File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 694, in reraise raise exception FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/yolov5/utils/dataloaders.py", line 661, in getitem img, labels = self.load_mosaic(index) File "/content/yolov5/utils/dataloaders.py", line 760, in loadmosaic img, , (h, w) = self.load_image(index) File "/content/yolov5/utils/dataloaders.py", line 735, in load_image im = cv2.imread(f) # BGR File "/content/yolov5/utils/general.py", line 1100, in imread return cv2.imdecode(np.fromfile(filename, np.uint8), flags) FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/dataset_yolo5_m3_2ndTraining/images/train/cam12_02.08.2023_15.24.47.jpg'

github-actions[bot] commented 10 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

glenn-jocher commented 10 months ago

@marco0913 it seems like the FileNotFoundError is persisting despite the change in the YOLOv5 model size. Have you already ensured that the file path and the existence of the specified file /content/drive/MyDrive/dataset_yolo5_m3_2ndTraining/images/train/cam12_02.08.2023_15.24.47.jpg are accurate for the new training run with YOLOv5m?

If the file exists and the path is correct, double-check the permissions and access rights for the file in the Google Colab environment. Additionally, ensure that the file is accessible and not in use by other processes. Let me know if this helps resolve the issue or if you have any other questions.

github-actions[bot] commented 9 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

github-actions[bot] commented 8 months ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐