ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.84k stars 16.37k forks source link

Traceback in seg-Training #11787

Closed abdallah1989203 closed 1 year ago

abdallah1989203 commented 1 year ago

Search before asking

YOLOv5 Component

Training, Multi-GPU

Bug

Epoch GPU_mem box_loss seg_loss obj_loss clsloss Instances Size 0%| | 0/257 [00:00<?, ?it/s]Traceback (most recent call last): File "segment/train.py", line 667, in 0%| | 0/257 [00:00<?, ?it/s] Traceback (most recent call last): File "segment/train.py", line 667, in main(opt) File "segment/train.py", line 558, in main train(opt.hyp, opt, device, callbacks) File "segment/train.py", line 287, in train main(opt) File "segment/train.py", line 558, in main for i, (imgs, targets, paths, , masks) in pbar: # batch ------------------------------------------------------ File "/data/Yolo/yolov5/utils/dataloaders.py", line 172, in iter yield next(self.iterator) File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 628, in next train(opt.hyp, opt, device, callbacks) File "segment/train.py", line 287, in train for i, (imgs, targets, paths, _, masks) in pbar: # batch ------------------------------------------------------ File "/home/mm/.local/lib/python3.7/site-packages/tqdm/std.py", line 1178, in iter data = self._next_data() File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data for obj in iterable: File "/data/Yolo/yolov5/utils/dataloaders.py", line 172, in iter return self._process_data(data) File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data yield next(self.iterator) File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 628, in next data = self._next_data() File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data data.reraise() File "/home/mm/.local/lib/python3.7/site-packages/torch/_utils.py", line 543, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data/Yolo/yolov5/utils/segment/dataloaders.py", line 115, in getitem img, labels, segments = self.load_mosaic(index) File "/data/Yolo/yolov5/utils/segment/dataloaders.py", line 263, in load_mosaic border=self.mosaic_border) # border to remove File "/data/Yolo/yolov5/utils/segment/augmentations.py", line 102, in random_perspective new_segments = np.array(new_segments)[i] IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 4

return self._process_data(data)

File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data data.reraise() File "/home/mm/.local/lib/python3.7/site-packages/torch/_utils.py", line 543, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/mm/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data/Yolo/yolov5/utils/segment/dataloaders.py", line 115, in getitem img, labels, segments = self.load_mosaic(index) File "/data/Yolo/yolov5/utils/segment/dataloaders.py", line 263, in load_mosaic border=self.mosaic_border) # border to remove File "/data/Yolo/yolov5/utils/segment/augmentations.py", line 102, in random_perspective new_segments = np.array(new_segments)[i] IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 4

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 57659) of binary: /opt/conda/bin/python Traceback (most recent call last): File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/mm/.local/lib/python3.7/site-packages/torch/distributed/run.py", line 766, in main() File "/home/mm/.local/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, *kwargs) File "/home/mm/.local/lib/python3.7/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/home/mm/.local/lib/python3.7/site-packages/torch/distributed/run.py", line 756, in run )(cmd_args) File "/home/mm/.local/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/mm/.local/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 248, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

segment/train.py FAILED

Failures: [1]: time : 2023-06-29_07:35:36 host : 5bc75d84eb31 rank : 1 (local_rank: 1) exitcode : 1 (pid: 57660) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-06-29_07:35:36 host : 5bc75d84eb31 rank : 0 (local_rank: 0) exitcode : 1 (pid: 57659) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Environment

github: ⚠️ YOLOv5 is out of date by 15 commits. Use 'git pull' or 'git clone https://github.com/ultralytics/yolov5' to update. YOLOv5 🚀 v7.0-172-gc3c1304 Python-3.7.7 torch-1.13.1+cu117 CUDA:0 (NVIDIA GeForce GTX TITAN X, 12213MiB) CUDA:1 (NVIDIA GeForce GTX TITAN X, 12213MiB) -OS: Ubuntu -Python: 3.7.7

Minimal Reproducible Example

python -m torch.distributed.run --nproc_per_node 2 segment/train.py --device 0,1

Additional

can any one please help?

Are you willing to submit a PR?

github-actions[bot] commented 1 year ago

👋 Hello @abdallah1989203, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 year ago

@abdallah1989203 based on the error message you provided, it seems that there is an index error in the DataLoader while training with YOLOv5. The specific error message indicates that there is a mismatch in the dimensions of the indexed array when applying the random_perspective augmentation.

One possible solution is to update your YOLOv5 repository to the latest version by using the command git pull or cloning the repository again with git clone https://github.com/ultralytics/yolov5. There have been 15 commits since your version, and updating to the latest version might resolve the issue.

If updating the repository doesn't solve the problem, you can try modifying the code in augmentations.py in the random_perspective function. Specifically, you can check the line where new_segments is defined and ensure that the boolean index matches the dimensions of the indexed array correctly.

Please let us know if updating the repository or modifying the code resolves the issue.

abdallah1989203 commented 1 year ago

Thanks @glenn-jocher.

glenn-jocher commented 1 year ago

@abdallah1989203 thanks for reporting this issue!

The error message suggests that there is a mismatch in the dimensions of the indexed array during the random_perspective augmentation in the DataLoader. It may be caused by a bug in the code.

To resolve this issue, you can try updating your YOLOv5 repository to the latest version by using git pull or cloning the repository again. There have been several commits since your version, and updating may fix the problem.

If updating doesn't solve the issue, you can modify the code in augmentations.py within the random_perspective function. Check the line where new_segments is defined and ensure that the boolean index matches the dimensions of the indexed array correctly.

Please let us know if updating the repository or modifying the code fixes the problem.

Thanks again for bringing this to our attention, and we appreciate your contribution to YOLOv5!