Closed kashf99 closed 3 years ago
I guess you have a degenerate box case. The boxes should be of format (xmin, ymin, xmax, ymax) for FRCNN to work. You are having exactly opposite bounding box (degenerate case).
Hi,
The answer from @oke-aditya is correct. You are probably passing to the model bounding boxes in the format [xmin, ymin, width, height]
, while Faster R-CNN expects boxes to be in [xmin, ymin, xmax, ymax]
format.
Changing this should fix the issue.
We have btw recently added box conversion utilities to torchvision (thanks to @oke-aditya ), they can be found in https://github.com/pytorch/vision/blob/a98e17e50146529cdfadb590ba063e6bbee71de2/torchvision/ops/boxes.py#L137-L156
So should I change my xml file format.
@kashf99 this question is better suited to the detecto repo, and this is part of their API. https://github.com/alankbi/detecto
Ok thank you
I guess you have a degenerate box case. The boxes should be of format (xmin, ymin, xmax, ymax) for FRCNN to work. You are having exactly opposite bounding box (degenerate case).
Yeah thank you . It worked. But its very slow. Overload of nonzero is deprecated.
Overload of nonzero is deprecated.
This has been fixed in torchvision master since https://github.com/pytorch/vision/pull/2705
Hi @fmassa . I am also getting the same error, but I had passed [xmin, ymin, xmax, ymax] to the model. Can someone help me out.
Can you post details so that we can reproduce the issue ?
@oke-aditya what I have share code or abstract details.
Any code sample that can help people to reprdouce error you get.
boxes.append([xmin, ymin, xmax, ymax]) boxes = torch.as_tensor(boxes, dtype=torch.float32) These are box cordinates. I'm passing.
@MALLI7622 make sure that xmin < xmax
and that ymin < ymax
for all boxes
@fmassa I had resolved the issue 4 days back, Thanks for your help. I was getting another error in Faster-RCNN. My model was resulting in these values. I don't know how to resolve this. I had changed the class index starting from 1 instead of 0 and increased output classes+1 because of starting with 1. Can you help me how to resolve this issue? Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
When I was predicting with this model. I didn't get anything. It was predicting this [{'boxes': tensor([], device='cuda:0', size=(0, 4)), 'labels': tensor([], device='cuda:0', dtype=torch.int64), 'scores': tensor([], device='cuda:0')}]
@MALLI7622 this might be due to many things. I would encourage you to start with the finetuning tutorial in https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html , as maybe you are not training for long enough.
@MALLI7622 how did you resolved the issue? I having similar issue for a custom dataset with 39 classes(including background). Any help will do. Thanks
@clothme-io Can you share your sample dataset file and also custom dataset class. I'll try to help you with it.
@MALLI7622 sure I can share it here as well as email it to you. And thank you for the help.
How I Generated The Dataset:
Annotated the image with labelme (multiple parts in a single image)
Generated a mask image (image below) from the annotated image.
Then I used the code here: to generate segmentation images (image below) I loaded to the model.
Here is my custom dataset class: ` class PersonDataset(torch.utils.data.Dataset): def init(self, root, transforms=None): self.root = root self.transforms = transforms
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "seg_image_use"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "seg_mask_use"))))
def __getitem__(self, idx):
# load one image and mask using idx
img_path = os.path.join(self.root, "seg_image_use", self.imgs[idx])
mask_path = os.path.join(self.root, "seg_mask_use", self.masks[idx])
img = Image.open(img_path).convert("RGB")
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask = Image.open(mask_path)
mask = np.asarray(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)[1:] # first id is the background, so remove it
masks = mask == obj_ids[:, None, None] # split the color-encoded mask into a set of binary masks
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
# convert everything into torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
target = {}
target["boxes"] = boxes
target["labels"] = torch.as_tensor(obj_ids, dtype=torch.int64) - 1
target["masks"] = torch.as_tensor(masks, dtype=torch.uint8)
target["image_id"] = torch.tensor([idx])
target["area"] = area
target["iscrowd"] = torch.zeros((num_objs,), dtype=torch.int64) # suppose all instances are not crowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)
`
Hi -
the example in torchvision is:
model22 = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4) labels = torch.randint(1, 91, (4, 11)) images = list(image for image in images) targets = [] for i in range(len(images)): d = {} d['boxes'] = boxes[i] d['labels'] = labels[i] targets.append(d) output = model22(images, targets)
model22.eval() x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] predictions = model22(x)
torch.onnx.export(model22, x, "faster_rcnn.onnx", opset_version = 11)
https://pytorch.org/vision/master/models.html#torchvision.models.detection.fasterrcnn_resnet50_fpn
and I get the same error:
ValueError: All bounding boxes should have positive height and width. Found invalid box [0.5358670949935913, 0.6406093239784241, 0.873319149017334, 0.33925700187683105] for target at index 0.
@OrielBanne one of you bounding boxes have a negative height, I would recommend you checking your training data
@OrielBanne Yes I found the same error while using this, maybe producing random bboxes( torch.rand(4, 11, 4)) is creating the problem
i have the same error and i have checked the data
I have a similar issue
Following this tutorial Building Your Own Object Detector Pytorch Vs Tensorflow And How To Even Get Started to use transfer learning to train a custom data set
Running on Google Colab
using CPU
Pytorch version: 1.11.0+cu113
Python version: Python 3.7.13
Clone the github repo of pytorch vision
as mentioned and copy
- pasted
the verion0.3.3 files vision/references/detection
in the working directory
references/detection/utils.py ../ references/detection/transforms.py ../ references/detection/coco_eval.py ../ references/detection/engine.py ../ references/detection/coco_utils.py ../
Model i am using
# load an object detection model pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
I did manually check the csv
file to see if any bounding boxes values have negative
value which I couldn't find any values.
Gave a print statement inside the engine.py
file where the error was being pointed to check for negative values
of the bounding box
for images, targets in metric_logger.log_every(data_loader, print_freq, header):
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
print("######################",targets)
loss_dict = model(images, targets)
Print statement output of targets
where the error is being pointed not even a single negative value
###################### [{'boxes': tensor([[ 98., 672., 829., 864.]]), 'labels': tensor([1]), 'image_id': tensor([734]), 'area': tensor([140352.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[262., 85., 463., 275.]]), 'labels': tensor([1]), 'image_id': tensor([110]), 'area': tensor([38190.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 82., 275., 259., 281.]]), 'labels': tensor([1]), 'image_id': tensor([296]), 'area': tensor([1062.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 85., 0., 357., 238.]]), 'labels': tensor([1]), 'image_id': tensor([68]), 'area': tensor([64736.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[188., 400., 730., 880.]]), 'labels': tensor([1]), 'image_id': tensor([788]), 'area': tensor([260160.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 40., 118., 320., 155.]]), 'labels': tensor([1]), 'image_id': tensor([598]), 'area': tensor([10360.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 0., 245., 293., 347.]]), 'labels': tensor([1]), 'image_id': tensor([605]), 'area': tensor([29886.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[201., 838., 611., 621.]]), 'labels': tensor([1]), 'image_id': tensor([696]), 'area': tensor([-88970.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[488., 669., 774., 541.]]), 'labels': tensor([1]), 'image_id': tensor([985]), 'area': tensor([-36608.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[129., 242., 138., 119.]]), 'labels': tensor([1]), 'image_id': tensor([813]), 'area': tensor([-1107.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 36., 77., 258., 247.]]), 'labels': tensor([1]), 'image_id': tensor([1780]), 'area': tensor([37740.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 66., 49., 308., 283.]]), 'labels': tensor([1]), 'image_id': tensor([868]), 'area': tensor([56628.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 23., 182., 343., 318.]]), 'labels': tensor([1]), 'image_id': tensor([1290]), 'area': tensor([43520.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[215., 0., 500., 266.]]), 'labels': tensor([1]), 'image_id': tensor([111]), 'area': tensor([75810.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 99., 105., 349., 210.]]), 'labels': tensor([1]), 'image_id': tensor([1350]), 'area': tensor([26250.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[319., 842., 384., 541.]]), 'labels': tensor([1]), 'image_id': tensor([803]), 'area': tensor([-19565.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 0., 19., 269., 283.]]), 'labels': tensor([1]), 'image_id': tensor([409]), 'area': tensor([71016.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 31., 0., 360., 339.]]), 'labels': tensor([1]), 'image_id': tensor([1651]), 'area': tensor([111531.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 0., 714., 585., 646.]]), 'labels': tensor([1]), 'image_id': tensor([989]), 'area': tensor([-39780.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 51., 170., 314., 317.]]), 'labels': tensor([1]), 'image_id': tensor([1449]), 'area': tensor([38661.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[394., 66., 640., 294.]]), 'labels': tensor([1]), 'image_id': tensor([177]), 'area': tensor([56088.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[396., 723., 592., 627.]]), 'labels': tensor([1]), 'image_id': tensor([940]), 'area': tensor([-18816.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 95., 54., 360., 187.]]), 'labels': tensor([1]), 'image_id': tensor([1579]), 'area': tensor([35245.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 67., 112., 293., 307.]]), 'labels': tensor([1]), 'image_id': tensor([1508]), 'area': tensor([44070.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 11., 0., 452., 355.]]), 'labels': tensor([1]), 'image_id': tensor([1162]), 'area': tensor([156555.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[268., 515., 698., 746.]]), 'labels': tensor([1]), 'image_id': tensor([741]), 'area': tensor([99330.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[156., 851., 598., 624.]]), 'labels': tensor([1]), 'image_id': tensor([900]), 'area': tensor([-100334.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 44., 123., 341., 305.]]), 'labels': tensor([1]), 'image_id': tensor([680]), 'area': tensor([54054.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235., 0., 598., 282.]]), 'labels': tensor([1]), 'image_id': tensor([1163]), 'area': tensor([102366.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 43., 156., 277., 289.]]), 'labels': tensor([1]), 'image_id': tensor([360]), 'area': tensor([31122.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 94., 0., 266., 250.]]), 'labels': tensor([1]), 'image_id': tensor([1591]), 'area': tensor([43000.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 71., 38., 343., 322.]]), 'labels': tensor([1]), 'image_id': tensor([1809]), 'area': tensor([77248.]), 'iscrowd': tensor([0])}]
I get his error
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
###################### [{'boxes': tensor([[ 98., 672., 829., 864.]]), 'labels': tensor([1]), 'image_id': tensor([734]), 'area': tensor([140352.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[262., 85., 463., 275.]]), 'labels': tensor([1]), 'image_id': tensor([110]), 'area': tensor([38190.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 82., 275., 259., 281.]]), 'labels': tensor([1]), 'image_id': tensor([296]), 'area': tensor([1062.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 85., 0., 357., 238.]]), 'labels': tensor([1]), 'image_id': tensor([68]), 'area': tensor([64736.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[188., 400., 730., 880.]]), 'labels': tensor([1]), 'image_id': tensor([788]), 'area': tensor([260160.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 40., 118., 320., 155.]]), 'labels': tensor([1]), 'image_id': tensor([598]), 'area': tensor([10360.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 0., 245., 293., 347.]]), 'labels': tensor([1]), 'image_id': tensor([605]), 'area': tensor([29886.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[201., 838., 611., 621.]]), 'labels': tensor([1]), 'image_id': tensor([696]), 'area': tensor([-88970.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[488., 669., 774., 541.]]), 'labels': tensor([1]), 'image_id': tensor([985]), 'area': tensor([-36608.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[129., 242., 138., 119.]]), 'labels': tensor([1]), 'image_id': tensor([813]), 'area': tensor([-1107.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 36., 77., 258., 247.]]), 'labels': tensor([1]), 'image_id': tensor([1780]), 'area': tensor([37740.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 66., 49., 308., 283.]]), 'labels': tensor([1]), 'image_id': tensor([868]), 'area': tensor([56628.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 23., 182., 343., 318.]]), 'labels': tensor([1]), 'image_id': tensor([1290]), 'area': tensor([43520.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[215., 0., 500., 266.]]), 'labels': tensor([1]), 'image_id': tensor([111]), 'area': tensor([75810.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 99., 105., 349., 210.]]), 'labels': tensor([1]), 'image_id': tensor([1350]), 'area': tensor([26250.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[319., 842., 384., 541.]]), 'labels': tensor([1]), 'image_id': tensor([803]), 'area': tensor([-19565.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 0., 19., 269., 283.]]), 'labels': tensor([1]), 'image_id': tensor([409]), 'area': tensor([71016.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 31., 0., 360., 339.]]), 'labels': tensor([1]), 'image_id': tensor([1651]), 'area': tensor([111531.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 0., 714., 585., 646.]]), 'labels': tensor([1]), 'image_id': tensor([989]), 'area': tensor([-39780.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 51., 170., 314., 317.]]), 'labels': tensor([1]), 'image_id': tensor([1449]), 'area': tensor([38661.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[394., 66., 640., 294.]]), 'labels': tensor([1]), 'image_id': tensor([177]), 'area': tensor([56088.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[396., 723., 592., 627.]]), 'labels': tensor([1]), 'image_id': tensor([940]), 'area': tensor([-18816.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 95., 54., 360., 187.]]), 'labels': tensor([1]), 'image_id': tensor([1579]), 'area': tensor([35245.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 67., 112., 293., 307.]]), 'labels': tensor([1]), 'image_id': tensor([1508]), 'area': tensor([44070.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 11., 0., 452., 355.]]), 'labels': tensor([1]), 'image_id': tensor([1162]), 'area': tensor([156555.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[268., 515., 698., 746.]]), 'labels': tensor([1]), 'image_id': tensor([741]), 'area': tensor([99330.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[156., 851., 598., 624.]]), 'labels': tensor([1]), 'image_id': tensor([900]), 'area': tensor([-100334.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 44., 123., 341., 305.]]), 'labels': tensor([1]), 'image_id': tensor([680]), 'area': tensor([54054.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235., 0., 598., 282.]]), 'labels': tensor([1]), 'image_id': tensor([1163]), 'area': tensor([102366.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 43., 156., 277., 289.]]), 'labels': tensor([1]), 'image_id': tensor([360]), 'area': tensor([31122.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 94., 0., 266., 250.]]), 'labels': tensor([1]), 'image_id': tensor([1591]), 'area': tensor([43000.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[ 71., 38., 343., 322.]]), 'labels': tensor([1]), 'image_id': tensor([1809]), 'area': tensor([77248.]), 'iscrowd': tensor([0])}]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-13-f8100031e21d>](https://localhost:8080/#) in <module>()
2 for epoch in range(num_epochs):
3 # train for one epoch, printing every 10 iterations
----> 4 train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10)
5 # update the learning rate
6 lr_scheduler.step()
2 frames
[/content/drive/MyDrive/PytorchObjectDetector/engine.py](https://localhost:8080/#) in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq)
30 print("######################",targets)
31
---> 32 loss_dict = model(images, targets)
33
34 losses = sum(loss for loss in loss_dict.values())
[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py](https://localhost:8080/#) in forward(self, images, targets)
89 degen_bb: List[float] = boxes[bb_idx].tolist()
90 raise ValueError(
---> 91 "All bounding boxes should have positive height and width."
92 f" Found invalid box {degen_bb} for target at index {target_idx}."
93 )
ValueError: All bounding boxes should have positive height and width. Found invalid box [139.397216796875, 581.7989501953125, 423.73980712890625, 431.1422119140625] for target at index 7.
I am sure, the problem has been addressed long back by looking at this responses given here
But I look at this post on stackoverflow suffering from same error ValueError: All bounding boxes should have positive height and width
Could any of you guide what exactly should be changed? and where it has to be changed?
I will surely write a medium blog on Pytorch Object Detection from custom data using Transfer Learning after I have sorted out these few minor hiccups
@fmassa I guess you could help me sort this issue out
Hey @santhoshnumberone , refer to @oke-aditya's comment here- https://github.com/pytorch/vision/issues/2740#issuecomment-702575254. The bounding boxes should be in form of (xmin, ymin, xmax, ymax).
In your bounding box data, there are few datapoints which do not fit the above format, some of them are -
tensor([[201., 838., 611., 621.]])
tensor([[488., 669., 774., 541.]])
tensor([[129., 242., 138., 119.]])
tensor([[319., 842., 384., 541.]])
tensor([[ 0., 714., 585., 646.]])
tensor([[396., 723., 592., 627.]])
tensor([[156., 851., 598., 624.]])
so first you need to check the format of the bounding boxes that you have. You need to convert the bounding boxes to (xmin, ymin, xmax, ymax) format. This function might be helpful for converting the bounding boxes. https://github.com/pytorch/vision/blob/a98e17e50146529cdfadb590ba063e6bbee71de2/torchvision/ops/boxes.py#L137-L189
I hope this helps.
Also note that if you are trying to train an object detection model you should use
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
since mask_rcnn is an instance segmentation model which will expect segmentation mask during training.
Hey @santhoshnumberone , refer to @oke-aditya's comment here- #2740 (comment). The bounding boxes should be in form of (xmin, ymin, xmax, ymax).
In your bounding box data, there are few datapoints which do not fit the above format, some of them are -
tensor([[201., 838., 611., 621.]]) tensor([[488., 669., 774., 541.]]) tensor([[129., 242., 138., 119.]]) tensor([[319., 842., 384., 541.]]) tensor([[ 0., 714., 585., 646.]]) tensor([[396., 723., 592., 627.]]) tensor([[156., 851., 598., 624.]])
so first you need to check the format of the bounding boxes that you have. You need to convert the bounding boxes to (xmin, ymin, xmax, ymax) format. This function might be helpful for converting the bounding boxes.
I hope this helps.
Thank you for highlighting the issue, will look into it. I blindly trusted a popular online image labelling tool to annotate my custom data
Also note that if you are trying to train an object detection model you should use
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
since mask_rcnn is an instance segmentation model which will expect segmentation mask during training.
Can't I freeze
everything apart from object detection
block using requires_grad = False
and train it?
Mask is required to calculate the loss I guess, I got his error
cpuset_checked))
###################### [{'boxes': tensor([[132., 0., 435., 285.]]), 'labels': tensor([1]), 'image_id': tensor([1889]), 'area': tensor([86355.]), 'iscrowd': tensor([0])}, {'boxes': tensor([[235., 0., 640., 315.]]), 'labels': tensor([1]), 'image_id': tensor([1210]), 'area': tensor([127575.]), 'iscrowd': tensor([0])}]
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-16-05e881bbc3b2>](https://localhost:8080/#) in <module>()
2 for epoch in range(num_epochs):
3 # train for one epoch, printing every 10 iterations
----> 4 train_one_epoch(model, optimizer, data_loader, device, epoch,print_freq=10)
5 # update the learning rate
6 lr_scheduler.step()
6 frames
[/content/drive/MyDrive/PytorchObjectDetector/engine.py](https://localhost:8080/#) in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq)
30 print("######################",targets)
31
---> 32 loss_dict = model(images, targets)
33
34 losses = sum(loss for loss in loss_dict.values())
[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py](https://localhost:8080/#) in forward(self, images, targets)
97 features = OrderedDict([("0", features)])
98 proposals, proposal_losses = self.rpn(images, features, targets)
---> 99 detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
100 detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes) # type: ignore[operator]
101
[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in forward(self, features, proposals, image_shapes, targets)
743
744 if self.training:
--> 745 proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
746 else:
747 labels = None
[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in select_training_samples(self, proposals, targets)
628 ):
629 # type: (...) -> Tuple[List[Tensor], List[Tensor], List[Tensor], List[Tensor]]
--> 630 self.check_targets(targets)
631 assert targets is not None
632 dtype = proposals[0].dtype
[/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/roi_heads.py](https://localhost:8080/#) in check_targets(self, targets)
620 assert all(["labels" in t for t in targets])
621 if self.has_mask():
--> 622 assert all(["masks" in t for t in targets])
623
624 def select_training_samples(
AssertionError:
I had the same problem, all the images and the masks were fine, for the image augmentation I was using this transforms :
from torchvision.transforms import v2 as T def get_transform(train):
transforms = []
if train:
transforms.append(T.RandomHorizontalFlip(0.2))
#transforms.append(T.RandomRotation(10))
transforms.append(T.ToDtype(torch.float, scale=True))
transforms.append(T.ToPureTensor())
return T.Compose(transforms)
when "transforms.append(T.RandomRotation(10))" was uncommented, i had an error when i start the training, but when I commented that line the training step was successfully done.
i am training detecto for custom object detection. anyone who can help me as soon as possible. i will be very grateful to you. here is the code. from detecto import core, utils, visualize dataset = core.Dataset('content/sample_data/newdataset/car/images/') model = core.Model(['car']) model.fit(dataset)
here is the output:
ValueError Traceback (most recent call last)