pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.01k stars 6.93k forks source link

ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0. #2740

Closed kashf99 closed 3 years ago

kashf99 commented 3 years ago

i am training detecto for custom object detection. anyone who can help me as soon as possible. i will be very grateful to you. here is the code. from detecto import core, utils, visualize dataset = core.Dataset('content/sample_data/newdataset/car/images/') model = core.Model(['car']) model.fit(dataset)

here is the output:

ValueError Traceback (most recent call last)

in () 4 model = core.Model(['car']) 5 ----> 6 model.fit(dataset) 2 frames /usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets) 91 raise ValueError("All bounding boxes should have positive height and width." 92 " Found invalid box {} for target at index {}." ---> 93 .format(degen_bb, target_idx)) 94 95 features = self.backbone(images.tensors) ValueError: All bounding boxes should have positive height and width. Found invaid box [500.728515625, 533.3333129882812, 231.10546875, 255.2083282470703] for target at index 0.
oke-aditya commented 3 years ago

I guess you have a degenerate box case. The boxes should be of format (xmin, ymin, xmax, ymax) for FRCNN to work. You are having exactly opposite bounding box (degenerate case).

fmassa commented 3 years ago

Hi,

The answer from @oke-aditya is correct. You are probably passing to the model bounding boxes in the format [xmin, ymin, width, height], while Faster R-CNN expects boxes to be in [xmin, ymin, xmax, ymax] format.

Changing this should fix the issue.

We have btw recently added box conversion utilities to torchvision (thanks to @oke-aditya ), they can be found in https://github.com/pytorch/vision/blob/a98e17e50146529cdfadb590ba063e6bbee71de2/torchvision/ops/boxes.py#L137-L156

kashf99 commented 3 years ago

So should I change my xml file format.

fmassa commented 3 years ago

@kashf99 this question is better suited to the detecto repo, and this is part of their API. https://github.com/alankbi/detecto

kashf99 commented 3 years ago

Ok thank you

kashf99 commented 3 years ago

I guess you have a degenerate box case. The boxes should be of format (xmin, ymin, xmax, ymax) for FRCNN to work. You are having exactly opposite bounding box (degenerate case).

Yeah thank you . It worked. But its very slow. Overload of nonzero is deprecated.

fmassa commented 3 years ago

Overload of nonzero is deprecated.

This has been fixed in torchvision master since https://github.com/pytorch/vision/pull/2705

MALLI7622 commented 3 years ago

Hi @fmassa . I am also getting the same error, but I had passed [xmin, ymin, xmax, ymax] to the model. Can someone help me out.

oke-aditya commented 3 years ago

Can you post details so that we can reproduce the issue ?

MALLI7622 commented 3 years ago

@oke-aditya what I have share code or abstract details.

oke-aditya commented 3 years ago

Any code sample that can help people to reprdouce error you get.

MALLI7622 commented 3 years ago

boxes.append([xmin, ymin, xmax, ymax]) boxes = torch.as_tensor(boxes, dtype=torch.float32) These are box cordinates. I'm passing.

fmassa commented 3 years ago

@MALLI7622 make sure that xmin < xmax and that ymin < ymax for all boxes

MALLI7622 commented 3 years ago

@fmassa I had resolved the issue 4 days back, Thanks for your help. I was getting another error in Faster-RCNN. My model was resulting in these values. I don't know how to resolve this. I had changed the class index starting from 1 instead of 0 and increased output classes+1 because of starting with 1. Can you help me how to resolve this issue? Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

When I was predicting with this model. I didn't get anything. It was predicting this [{'boxes': tensor([], device='cuda:0', size=(0, 4)), 'labels': tensor([], device='cuda:0', dtype=torch.int64), 'scores': tensor([], device='cuda:0')}]

fmassa commented 3 years ago

@MALLI7622 this might be due to many things. I would encourage you to start with the finetuning tutorial in https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html , as maybe you are not training for long enough.

clothme-io commented 3 years ago

@MALLI7622 how did you resolved the issue? I having similar issue for a custom dataset with 39 classes(including background). Any help will do. Thanks

MALLI7622 commented 3 years ago

@clothme-io Can you share your sample dataset file and also custom dataset class. I'll try to help you with it.

clothme-io commented 3 years ago

@MALLI7622 sure I can share it here as well as email it to you. And thank you for the help.

How I Generated The Dataset:

  1. Annotated the image with labelme (multiple parts in a single image)

  2. Generated a mask image (image below) from the annotated image. person95

  3. Then I used the code here: to generate segmentation images (image below) I loaded to the model. person95

Here is my custom dataset class: ` class PersonDataset(torch.utils.data.Dataset): def init(self, root, transforms=None): self.root = root self.transforms = transforms

load all image files, sorting them to

    # ensure that they are aligned
    self.imgs = list(sorted(os.listdir(os.path.join(root, "seg_image_use"))))
    self.masks = list(sorted(os.listdir(os.path.join(root, "seg_mask_use"))))

def __getitem__(self, idx):
    # load one image and mask using idx
    img_path = os.path.join(self.root, "seg_image_use", self.imgs[idx])
    mask_path = os.path.join(self.root, "seg_mask_use", self.masks[idx])
    img = Image.open(img_path).convert("RGB")
    # note that we haven't converted the mask to RGB,
    # because each color corresponds to a different instance
    # with 0 being background
    mask = Image.open(mask_path)

    mask = np.asarray(mask)
    # instances are encoded as different colors
    obj_ids = np.unique(mask)[1:] # first id is the background, so remove it   
    masks = mask == obj_ids[:, None, None]  # split the color-encoded mask into a set of binary masks
    # get bounding box coordinates for each mask
    num_objs = len(obj_ids)
    boxes = []

    for i in range(num_objs):
        pos = np.where(masks[i])
        xmin = np.min(pos[1])
        xmax = np.max(pos[1])
        ymin = np.min(pos[0])
        ymax = np.max(pos[0])
        boxes.append([xmin, ymin, xmax, ymax])

   # convert everything into torch.Tensor
    boxes = torch.as_tensor(boxes, dtype=torch.float32)      
    area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])

    target = {}
    target["boxes"] = boxes
    target["labels"] = torch.as_tensor(obj_ids, dtype=torch.int64) - 1
    target["masks"] = torch.as_tensor(masks, dtype=torch.uint8) 
    target["image_id"] = torch.tensor([idx]) 
    target["area"] = area
    target["iscrowd"] = torch.zeros((num_objs,), dtype=torch.int64) # suppose all instances are not crowd

    if self.transforms is not None:
        img, target = self.transforms(img, target)

    return img, target

def __len__(self):
    return len(self.imgs)

`

OrielBanne commented 3 years ago

Hi -

the example in torchvision is:

model22 = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

For training

images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4) labels = torch.randint(1, 91, (4, 11)) images = list(image for image in images) targets = [] for i in range(len(images)): d = {} d['boxes'] = boxes[i] d['labels'] = labels[i] targets.append(d) output = model22(images, targets)

For inference

model22.eval() x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] predictions = model22(x)

optionally, if you want to export the model to ONNX:

torch.onnx.export(model22, x, "faster_rcnn.onnx", opset_version = 11)

https://pytorch.org/vision/master/models.html#torchvision.models.detection.fasterrcnn_resnet50_fpn

and I get the same error:

ValueError: All bounding boxes should have positive height and width. Found invalid box [0.5358670949935913, 0.6406093239784241, 0.873319149017334, 0.33925700187683105] for target at index 0.

fmassa commented 3 years ago

@OrielBanne one of you bounding boxes have a negative height, I would recommend you checking your training data

mrinath123 commented 2 years ago

@OrielBanne Yes I found the same error while using this, maybe producing random bboxes( torch.rand(4, 11, 4)) is creating the problem

Esraanageh22 commented 2 years ago

i have the same error ask51 and i have checked the data ask1 ask2

santhoshnumberone commented 2 years ago

I have a similar issue

Following this tutorial Building Your Own Object Detector Pytorch Vs Tensorflow And How To Even Get Started to use transfer learning to train a custom data set

Running on Google Colab using CPU Pytorch version: 1.11.0+cu113 Python version: Python 3.7.13

Clone the github repo of pytorch vision as mentioned and copy - pasted the verion0.3.3 files vision/references/detection in the working directory

references/detection/utils.py ../ references/detection/transforms.py ../ references/detection/coco_eval.py ../ references/detection/engine.py ../ references/detection/coco_utils.py ../

Model i am using

# load an object detection model pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

I did manually check the csv file to see if any bounding boxes values have negative value which I couldn't find any values.

Gave a print statement inside the engine.py file where the error was being pointed to check for negative values of the bounding box

    for images, targets in metric_logger.log_every(data_loader, print_freq, header):
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        print("######################",targets)

        loss_dict = model(images, targets)

Print statement output of targets where the error is being pointed not even a single negative value

###################### [{'boxes': tensor([[ 98., 672., 829., 864.]]), 'labels': tensor([1]), 'image_id': tensor([734]), 'area': tensor([140352.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[262.,  85., 463., 275.]]), 'labels': tensor([1]), 'image_id': tensor([110]), 'area': tensor([38190.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 82., 275., 259., 281.]]), 'labels': tensor([1]), 'image_id': tensor([296]), 'area': tensor([1062.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 85.,   0., 357., 238.]]), 'labels': tensor([1]), 'image_id': tensor([68]), 'area': tensor([64736.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[188., 400., 730., 880.]]), 'labels': tensor([1]), 'image_id': tensor([788]), 'area': tensor([260160.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 40., 118., 320., 155.]]), 'labels': tensor([1]), 'image_id': tensor([598]), 'area': tensor([10360.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 245., 293., 347.]]), 'labels': tensor([1]), 'image_id': tensor([605]), 'area': tensor([29886.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[201., 838., 611., 621.]]), 'labels': tensor([1]), 'image_id': tensor([696]), 'area': tensor([-88970.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[488., 669., 774., 541.]]), 'labels': tensor([1]), 'image_id': tensor([985]), 'area': tensor([-36608.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[129., 242., 138., 119.]]), 'labels': tensor([1]), 'image_id': tensor([813]), 'area': tensor([-1107.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 36.,  77., 258., 247.]]), 'labels': tensor([1]), 'image_id': tensor([1780]), 'area': tensor([37740.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 66.,  49., 308., 283.]]), 'labels': tensor([1]), 'image_id': tensor([868]), 'area': tensor([56628.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 23., 182., 343., 318.]]), 'labels': tensor([1]), 'image_id': tensor([1290]), 'area': tensor([43520.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[215.,   0., 500., 266.]]), 'labels': tensor([1]), 'image_id': tensor([111]), 'area': tensor([75810.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 99., 105., 349., 210.]]), 'labels': tensor([1]), 'image_id': tensor([1350]), 'area': tensor([26250.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[319., 842., 384., 541.]]), 'labels': tensor([1]), 'image_id': tensor([803]), 'area': tensor([-19565.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0.,  19., 269., 283.]]), 'labels': tensor([1]), 'image_id': tensor([409]), 'area': tensor([71016.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 31.,   0., 360., 339.]]), 'labels': tensor([1]), 'image_id': tensor([1651]), 'area': tensor([111531.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 714., 585., 646.]]), 'labels': tensor([1]), 'image_id': tensor([989]), 'area': tensor([-39780.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 51., 170., 314., 317.]]), 'labels': tensor([1]), 'image_id': tensor([1449]), 'area': tensor([38661.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[394.,  66., 640., 294.]]), 'labels': tensor([1]), 'image_id': tensor([177]), 'area': tensor([56088.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[396., 723., 592., 627.]]), 'labels': tensor([1]), 'image_id': tensor([940]), 'area': tensor([-18816.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 95.,  54., 360., 187.]]), 'labels': tensor([1]), 'image_id': tensor([1579]), 'area': tensor([35245.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 67., 112., 293., 307.]]), 'labels': tensor([1]), 'image_id': tensor([1508]), 'area': tensor([44070.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 11.,   0., 452., 355.]]), 'labels': tensor([1]), 'image_id': tensor([1162]), 'area': tensor([156555.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[268., 515., 698., 746.]]), 'labels': tensor([1]), 'image_id': tensor([741]), 'area': tensor([99330.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[156., 851., 598., 624.]]), 'labels': tensor([1]), 'image_id': tensor([900]), 'area': tensor([-100334.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 44., 123., 341., 305.]]), 'labels': tensor([1]), 'image_id': tensor([680]), 'area': tensor([54054.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 598., 282.]]), 'labels': tensor([1]), 'image_id': tensor([1163]), 'area': tensor([102366.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 43., 156., 277., 289.]]), 'labels': tensor([1]), 'image_id': tensor([360]), 'area': tensor([31122.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 94.,   0., 266., 250.]]), 'labels': tensor([1]), 'image_id': tensor([1591]), 'area': tensor([43000.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 71.,  38., 343., 322.]]), 'labels': tensor([1]), 'image_id': tensor([1809]), 'area': tensor([77248.]), 'iscrowd': tensor([0])}]

I get his error

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
###################### [{'boxes': tensor([[ 98., 672., 829., 864.]]), 'labels': tensor([1]), 'image_id': tensor([734]), 'area': tensor([140352.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[262.,  85., 463., 275.]]), 'labels': tensor([1]), 'image_id': tensor([110]), 'area': tensor([38190.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 82., 275., 259., 281.]]), 'labels': tensor([1]), 'image_id': tensor([296]), 'area': tensor([1062.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 85.,   0., 357., 238.]]), 'labels': tensor([1]), 'image_id': tensor([68]), 'area': tensor([64736.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[188., 400., 730., 880.]]), 'labels': tensor([1]), 'image_id': tensor([788]), 'area': tensor([260160.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 40., 118., 320., 155.]]), 'labels': tensor([1]), 'image_id': tensor([598]), 'area': tensor([10360.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 245., 293., 347.]]), 'labels': tensor([1]), 'image_id': tensor([605]), 'area': tensor([29886.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[201., 838., 611., 621.]]), 'labels': tensor([1]), 'image_id': tensor([696]), 'area': tensor([-88970.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[488., 669., 774., 541.]]), 'labels': tensor([1]), 'image_id': tensor([985]), 'area': tensor([-36608.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[129., 242., 138., 119.]]), 'labels': tensor([1]), 'image_id': tensor([813]), 'area': tensor([-1107.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 36.,  77., 258., 247.]]), 'labels': tensor([1]), 'image_id': tensor([1780]), 'area': tensor([37740.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 66.,  49., 308., 283.]]), 'labels': tensor([1]), 'image_id': tensor([868]), 'area': tensor([56628.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 23., 182., 343., 318.]]), 'labels': tensor([1]), 'image_id': tensor([1290]), 'area': tensor([43520.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[215.,   0., 500., 266.]]), 'labels': tensor([1]), 'image_id': tensor([111]), 'area': tensor([75810.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 99., 105., 349., 210.]]), 'labels': tensor([1]), 'image_id': tensor([1350]), 'area': tensor([26250.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[319., 842., 384., 541.]]), 'labels': tensor([1]), 'image_id': tensor([803]), 'area': tensor([-19565.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0.,  19., 269., 283.]]), 'labels': tensor([1]), 'image_id': tensor([409]), 'area': tensor([71016.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 31.,   0., 360., 339.]]), 'labels': tensor([1]), 'image_id': tensor([1651]), 'area': tensor([111531.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[  0., 714., 585., 646.]]), 'labels': tensor([1]), 'image_id': tensor([989]), 'area': tensor([-39780.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 51., 170., 314., 317.]]), 'labels': tensor([1]), 'image_id': tensor([1449]), 'area': tensor([38661.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[394.,  66., 640., 294.]]), 'labels': tensor([1]), 'image_id': tensor([177]), 'area': tensor([56088.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[396., 723., 592., 627.]]), 'labels': tensor([1]), 'image_id': tensor([940]), 'area': tensor([-18816.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 95.,  54., 360., 187.]]), 'labels': tensor([1]), 'image_id': tensor([1579]), 'area': tensor([35245.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 67., 112., 293., 307.]]), 'labels': tensor([1]), 'image_id': tensor([1508]), 'area': tensor([44070.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 11.,   0., 452., 355.]]), 'labels': tensor([1]), 'image_id': tensor([1162]), 'area': tensor([156555.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[268., 515., 698., 746.]]), 'labels': tensor([1]), 'image_id': tensor([741]), 'area': tensor([99330.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[156., 851., 598., 624.]]), 'labels': tensor([1]), 'image_id': tensor([900]), 'area': tensor([-100334.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 44., 123., 341., 305.]]), 'labels': tensor([1]), 'image_id': tensor([680]), 'area': tensor([54054.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 598., 282.]]), 'labels': tensor([1]), 'image_id': tensor([1163]), 'area': tensor([102366.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 43., 156., 277., 289.]]), 'labels': tensor([1]), 'image_id': tensor([360]), 'area': tensor([31122.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 94.,   0., 266., 250.]]), 'labels': tensor([1]), 'image_id': tensor([1591]), 'area': tensor([43000.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 71.,  38., 343., 322.]]), 'labels': tensor([1]), 'image_id': tensor([1809]), 'area': tensor([77248.]), 'iscrowd': tensor([0])}]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-13-f8100031e21d>](https://localhost:8080/#) in <module>()
      2 for epoch in range(num_epochs):
      3     # train for one epoch, printing every 10 iterations
----> 4     train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10)
      5     # update the learning rate
      6     lr_scheduler.step()

2 frames
[/content/drive/MyDrive/PytorchObjectDetector/engine.py](https://localhost:8080/#) in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq)
     30         print("######################",targets)
     31 
---> 32         loss_dict = model(images, targets)
     33 
     34         losses = sum(loss for loss in loss_dict.values())

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py](https://localhost:8080/#) in forward(self, images, targets)
     89                     degen_bb: List[float] = boxes[bb_idx].tolist()
     90                     raise ValueError(
---> 91                         "All bounding boxes should have positive height and width."
     92                         f" Found invalid box {degen_bb} for target at index {target_idx}."
     93                     )

ValueError: All bounding boxes should have positive height and width. Found invalid box [139.397216796875, 581.7989501953125, 423.73980712890625, 431.1422119140625] for target at index 7.

I am sure, the problem has been addressed long back by looking at this responses given here

But I look at this post on stackoverflow suffering from same error ValueError: All bounding boxes should have positive height and width

Could any of you guide what exactly should be changed? and where it has to be changed?

I will surely write a medium blog on Pytorch Object Detection from custom data using Transfer Learning after I have sorted out these few minor hiccups

@fmassa I guess you could help me sort this issue out

abhi-glitchhg commented 2 years ago

Hey @santhoshnumberone , refer to @oke-aditya's comment here- https://github.com/pytorch/vision/issues/2740#issuecomment-702575254. The bounding boxes should be in form of (xmin, ymin, xmax, ymax).

In your bounding box data, there are few datapoints which do not fit the above format, some of them are -

tensor([[201., 838., 611., 621.]])
tensor([[488., 669., 774., 541.]])
tensor([[129., 242., 138., 119.]])
tensor([[319., 842., 384., 541.]])
tensor([[  0., 714., 585., 646.]])
tensor([[396., 723., 592., 627.]])
tensor([[156., 851., 598., 624.]])

so first you need to check the format of the bounding boxes that you have. You need to convert the bounding boxes to (xmin, ymin, xmax, ymax) format. This function might be helpful for converting the bounding boxes. https://github.com/pytorch/vision/blob/a98e17e50146529cdfadb590ba063e6bbee71de2/torchvision/ops/boxes.py#L137-L189

I hope this helps.

oke-aditya commented 2 years ago

Also note that if you are trying to train an object detection model you should use

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

since mask_rcnn is an instance segmentation model which will expect segmentation mask during training.

santhoshnumberone commented 2 years ago

Hey @santhoshnumberone , refer to @oke-aditya's comment here- #2740 (comment). The bounding boxes should be in form of (xmin, ymin, xmax, ymax).

In your bounding box data, there are few datapoints which do not fit the above format, some of them are -

tensor([[201., 838., 611., 621.]])
tensor([[488., 669., 774., 541.]])
tensor([[129., 242., 138., 119.]])
tensor([[319., 842., 384., 541.]])
tensor([[  0., 714., 585., 646.]])
tensor([[396., 723., 592., 627.]])
tensor([[156., 851., 598., 624.]])

so first you need to check the format of the bounding boxes that you have. You need to convert the bounding boxes to (xmin, ymin, xmax, ymax) format. This function might be helpful for converting the bounding boxes.

https://github.com/pytorch/vision/blob/a98e17e50146529cdfadb590ba063e6bbee71de2/torchvision/ops/boxes.py#L137-L189

I hope this helps.

Thank you for highlighting the issue, will look into it. I blindly trusted a popular online image labelling tool to annotate my custom data

santhoshnumberone commented 2 years ago

Also note that if you are trying to train an object detection model you should use

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

since mask_rcnn is an instance segmentation model which will expect segmentation mask during training.

MarkRCNNSl

Can't I freeze everything apart from object detection block using requires_grad = False and train it?

PS

Mask is required to calculate the loss I guess, I got his error

  cpuset_checked))
###################### [{'boxes': tensor([[132.,   0., 435., 285.]]), 'labels': tensor([1]), 'image_id': tensor([1889]), 'area': tensor([86355.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235.,   0., 640., 315.]]), 'labels': tensor([1]), 'image_id': tensor([1210]), 'area': tensor([127575.]), 'iscrowd': tensor([0])}]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-16-05e881bbc3b2>](https://localhost:8080/#) in <module>()
      2 for epoch in range(num_epochs):
      3     # train for one epoch, printing every 10 iterations
----> 4     train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10)
      5     # update the learning rate
      6     lr_scheduler.step()

6 frames
[/content/drive/MyDrive/PytorchObjectDetector/engine.py](https://localhost:8080/#) in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq)
     30         print("######################",targets)
     31 
---> 32         loss_dict = model(images, targets)
     33 
     34         losses = sum(loss for loss in loss_dict.values())

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py](https://localhost:8080/#) in forward(self, images, targets)
     97             features = OrderedDict([("0", features)])
     98         proposals, proposal_losses = self.rpn(images, features, targets)
---> 99         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
    100         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)  # type: ignore[operator]
    101 

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in forward(self, features, proposals, image_shapes, targets)
    743 
    744         if self.training:
--> 745             proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
    746         else:
    747             labels = None

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in select_training_samples(self, proposals, targets)
    628     ):
    629         # type: (...) -> Tuple[List[Tensor], List[Tensor], List[Tensor], List[Tensor]]
--> 630         self.check_targets(targets)
    631         assert targets is not None
    632         dtype = proposals[0].dtype

[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in check_targets(self, targets)
    620         assert all(["labels" in t for t in targets])
    621         if self.has_mask():
--> 622             assert all(["masks" in t for t in targets])
    623 
    624     def select_training_samples(

AssertionError:
ihebchiha123 commented 4 months ago

I had the same problem, all the images and the masks were fine, for the image augmentation I was using this transforms :

from torchvision.transforms import v2 as T def get_transform(train):

transforms = []
if train:
    transforms.append(T.RandomHorizontalFlip(0.2))
    #transforms.append(T.RandomRotation(10))
transforms.append(T.ToDtype(torch.float, scale=True))
transforms.append(T.ToPureTensor())
return T.Compose(transforms)

when "transforms.append(T.RandomRotation(10))" was uncommented, i had an error when i start the training, but when I commented that line the training step was successfully done.