ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.03k stars 16.17k forks source link

YOLOv5 segmentation np.uint8 bug #9461

Closed mqbin closed 2 years ago

mqbin commented 2 years ago

Search before asking

YOLOv5 Component

Training

Bug

I want to try the YOLOv5 segmentation task on my own dataset, but I encounter the following error.

Traceback (most recent call last):
  File "/home/mqb/yolov5/segment/train.py", line 676, in <module>
    main(opt)
  File "/home/mqb/yolov5/segment/train.py", line 572, in main
    train(opt.hyp, opt, device, callbacks)
  File "/home/mqb/yolov5/segment/train.py", line 295, in train
    for i, (imgs, targets, paths, _, masks) in pbar:  # batch ------------------------------------------------------
  File "/home/mqb/anaconda3/envs/py39/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/mqb/yolov5/utils/dataloaders.py", line 170, in __iter__
    yield next(self.iterator)
  File "/home/mqb/anaconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/mqb/anaconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/home/mqb/anaconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/home/mqb/anaconda3/envs/py39/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/mqb/anaconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/mqb/anaconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mqb/anaconda3/envs/py39/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mqb/yolov5/utils/segment/dataloaders.py", line 167, in __getitem__
    masks = (torch.from_numpy(masks) if len(masks) else torch.zeros(1 if self.overlap else nl, img.shape[0] //
TypeError: can't convert np.ndarray of type numpy.uint16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

I customize the dataset image size to be 256*256 and the format is that of the COCO dataset. For training I used the tool JSON2YOLO to convert it to YOLO data format and then I got the error, how do I fix this?

The same data works fine for the detection task, but it will be wrong for the segmentation task. The data format is as follows

1 0.253906 0.982422 0.226562 0.978516 0.216797 0.957031 0.246094 0.951172 0.259766 0.96875 0.253906 0.982422

Environment

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 2 years ago

👋 Hello @mqbin, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 2 years ago

@mqbin your labels look good. The error seems to imply that your labels are in uint16, but they need to be floats for the fractions.

Are you able to train with COCO128-seg?

python segment/train.py --data coco128-seg.yaml --weights yolov5s-seg.pt --img 640
mqbin commented 2 years ago

@mqbin your labels look good. The error seems to imply that your labels are in uint16, but they need to be floats for the fractions.

Are you able to train with COCO128-seg?

python segment/train.py --data coco128-seg.yaml --weights yolov5s-seg.pt --img 640

@glenn-jocher Yes, I can train with COCO128-seg. Is there something wrong with the format of my dataset?

ghost commented 2 years ago
\yolov5\utils\segment\dataloaders.py

def polygons2masks_overlap(img_size, segments, downsample_ratio=1):
    """Return a (640, 640) overlap mask."""
    - masks = np.zeros((img_size[0] // downsample_ratio, img_size[1] // downsample_ratio), dtype=np.uint8)
        + masks = np.zeros((img_size[0] // downsample_ratio, img_size[1] // downsample_ratio), dtype=np.int32)
glenn-jocher commented 2 years ago

@mqbin yes there's likely an issue with your dataset. You should review the structure and labels vs COCO128-seg to make sure they align.

@DonkeySmall what's the diff you have there?

mqbin commented 2 years ago
\yolov5\utils\segment\dataloaders.py

def polygons2masks_overlap(img_size, segments, downsample_ratio=1):
    """Return a (640, 640) overlap mask."""
  - masks = np.zeros((img_size[0] // downsample_ratio, img_size[1] // downsample_ratio), dtype=np.uint8)
        + masks = np.zeros((img_size[0] // downsample_ratio, img_size[1] // downsample_ratio), dtype=np.int32)

It works, thanks a lot!

mqbin commented 2 years ago

@mqbin yes there's likely an issue with your dataset. You should review the structure and labels vs COCO128-seg to make sure they align.

@DonkeySmall what's the diff you have there?

@glenn-jocher , I solved this problem using the method given by @DonkeySmall. But I haven't looked at the code yet, I don't know why I have this problem. Will this change I make to the code affect the results?

glenn-jocher commented 2 years ago

@mqbin I don't know, but in general masks should be uint8 when used for plotting. You're saying there's no issue with COCO128-seg but on your custom dataset you run into an error and the above change resolves this error?

mqbin commented 2 years ago

@mqbin I don't know, but in general masks should be uint8 when used for plotting. You're saying there's no issue with COCO128-seg but on your custom dataset you run into an error and the above change resolves this error?

@glenn-jocher Yes, the above change resolves this error.

glenn-jocher commented 2 years ago

@mqbin ok thank you! I've opened up #9493 to test the fix.

Laughing-q commented 2 years ago

@DonkeySmall @mqbin We still can't reproduce this issue, could you please provide us more information on this?

mqbin commented 2 years ago

@DonkeySmall @mqbin We still can't reproduce this issue, could you please provide us more information on this?

@Laughing-q I am using a cell segmentation dataset where each image size is split into 256*256 and there are many more instances on each image compared to the COCO128 dataset. I am not sure if this is the cause of the problem.

glenn-jocher commented 2 years ago

@mqbin can you share the dataset to help us reproduce the problem?

ghost commented 2 years ago

@DonkeySmall @mqbin We still can't reproduce this issue, could you please provide us more information on this?

Sorry, I can't reproduce this problem anymore, cleaned up my dataset.

mqbin commented 2 years ago

@glenn-jocher @Laughing-q Can you give me your email address, I will send you a part of the dataset in an email to reproduce the error

Laughing-q commented 2 years ago

@mqbin yeah sure, laughingq@163.com I just downloaded the datasets, but it'd be better if you send us yours. :)

Laughing-q commented 2 years ago

@mqbin @DonkeySmall I've reproduced this issue, so this is about the number of objects. This error occurs when the number of objects is beyond 255 in one image. We'll fix it by the PR Glenn mentioned before.

glenn-jocher commented 2 years ago

@mqbin good news 😃! Your original issue may now be fixed ✅ in PR #9493. To receive this update:

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!