pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.26k stars 6.96k forks source link

FastRCNNPredictor doesn't return prediction in evaluation #1952

Closed FrancescoSaverioZuppichini closed 4 years ago

FrancescoSaverioZuppichini commented 4 years ago

🐛 Bug

Dear all,

I am doing object detection in an image with one class. After training, FastRCNNPredictor does not return anything in validation mode. I have followed this official tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html.

Thanks.

To Reproduce

Steps to reproduce the behavior:

I have created a custom dataset, this is one of the output:

tensor([[[0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1686, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1529],
          ...,
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490]],

         [[0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1294, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1137],
          ...,
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098]],

         [[0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1216, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1059],
          ...,
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059]]]),
 {'boxes': tensor([[315.0003, 213.5002, 626.0004, 329.5002]]),
  'labels': tensor([0]),
  'image_id': tensor([1]),
  'area': tensor([36503.9961]),
  'iscrowd': tensor([0])})

To prove its correctness I have also visualized the bbox on the image:

image

Then I create a Dataloader:


dl = DataLoader(ds, batch_size=8, num_workers=4, collate_fn=lambda x: tuple(zip(*x)))

model = fasterrcnn_resnet50_fpn(num_classes=1).to(device)

params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)

Training works:

model.train()
for i in range(5):

    for images, targets in dl:
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k,v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        optimizer.zero_grad()
        losses.backward()
        optimizer.step()

        print(losses)

Output:

tensor(0.6391, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6329, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6139, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5965, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5814, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5468, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.5049, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.4502, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.3787, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.2502, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.1605, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0940, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0558, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0507, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0413, device='cuda:0', grad_fn=<AddBackward0>)

But, when I try to get a prediction I have no output:

model = model.eval()
with torch.no_grad():
    model = model.cuda()
    pred = model([ds[2][0].cuda()])

pred is

[{'boxes': tensor([], size=(0, 4)),
  'labels': tensor([], dtype=torch.int64),
  'scores': tensor([])}]

Thank you in advance

Expected behavior

The model should return a valid prediction.

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 430.50
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] efficientnet-pytorch==0.5.1
[pip] msgpack-numpy==0.4.3.2
[pip] numpy==1.17.4
[pip] PytorchStorage==0.0.0
[pip] torch==1.4.0
[pip] torchbearer==0.5.3
[pip] torchlego==0.0.0
[pip] torchsummary==1.5.1
[pip] torchvision==0.5.0
[conda] _pytorch_select           0.2                       gpu_0  
[conda] blas                      1.0                         mkl  
[conda] efficientnet-pytorch      0.5.1                    pypi_0    pypi
[conda] libblas                   3.8.0                    14_mkl    conda-forge
[conda] libcblas                  3.8.0                    14_mkl    conda-forge
[conda] liblapack                 3.8.0                    14_mkl    conda-forge
[conda] liblapacke                3.8.0                    14_mkl    conda-forge
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] pytorchstorage            0.0.0                    pypi_0    pypi
[conda] torch                     1.4.0                    pypi_0    pypi
[conda] torchbearer               0.5.3                    pypi_0    pypi
[conda] torchlego                 0.0.0                    pypi_0    pypi
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchvision               0.5.0                    pypi_0    pypi
fmassa commented 4 years ago

Hi,

I believe you haven't trained for enough iterations to be able to see the model converge, specially because you are not using a pre-trained model but instead are training it from scratch, which requires a lot of iterations.

I would recommend following the fine-tuning steps in the tutorial that you pointed out, as you'll probably see better and faster results on limited data.

I'm closing the issue, but let us know if you have further questions/

FrancescoSaverioZuppichini commented 4 years ago

Hi @fmassa,

Thanks :)

The tutorial was followed correctly. The loss is correctly decreasing during training and I see no problem at all. In this case, no output means no bbox. I will use the pre-trained weights and train the model for a longer time. In the meantime, could you be so kind to have a look at the code I have attached? Maybe I missed something. One last question, should I resize the image to normal imagenet format (224)?

Thank you

fmassa commented 4 years ago

The code seems correct to me.

You don't need to resize the image to 224, just make sure your images are in 0-1 range in RGB, and the model will rescale them internally for you

FrancescoSaverioZuppichini commented 4 years ago

The model is internally resizing the image and the bboxes to (480, 640, C), (COCO format), isn't it?

FrancescoSaverioZuppichini commented 4 years ago

Using a pretrained network as follows:

from torchvision.models.detection.faster_rcnn  import fasterrcnn_resnet50_fpn
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

model = fasterrcnn_resnet50_fpn(True).to(device)

num_classes = 1  
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes).to(device)

And trainig the network as showed in the first post I get the following output:

tensor(0.0620, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.1114, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0959, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0404, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0653, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0422, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0317, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0355, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0278, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0377, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0372, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0334, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0235, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0251, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0247, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0220, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0195, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0216, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0260, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0247, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0163, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0161, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0149, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0171, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0158, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0155, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0122, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0179, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0129, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0119, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0133, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0140, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0145, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0131, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0117, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0094, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0123, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0126, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0086, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0106, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0117, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0099, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0119, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0109, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0124, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0075, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0088, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0132, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0069, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0101, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0099, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0097, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0087, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0101, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0054, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0092, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0095, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0055, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0078, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0098, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0041, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0080, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0118, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0048, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0089, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0085, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0043, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0074, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0105, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0036, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0075, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0080, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.0049, device='cuda:0', grad_fn=<AddBackward0>)

but still,

model = model.eval()
with torch.no_grad():
    model = model.cuda()
    pred = model([ds[2][0].cuda()])

pred is still empty

[{'boxes': tensor([], size=(0, 4)),
  'labels': tensor([], dtype=torch.int64),
  'scores': tensor([])}]

Any idea?

On my side, I have rechecked the type of the inputs and they are correct. An example of one item in the dataset is:

(tensor([[[0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1686, 0.1569, 0.1569],
          [0.0549, 0.0549, 0.0549,  ..., 0.1647, 0.1569, 0.1529],
          ...,
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490],
          [0.0471, 0.0471, 0.0471,  ..., 0.1490, 0.1490, 0.1490]],

         [[0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1294, 0.1176, 0.1176],
          [0.0471, 0.0471, 0.0471,  ..., 0.1255, 0.1176, 0.1137],
          ...,
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098],
          [0.0235, 0.0235, 0.0235,  ..., 0.1098, 0.1098, 0.1098]],

         [[0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1216, 0.1098, 0.1098],
          [0.0510, 0.0510, 0.0510,  ..., 0.1176, 0.1098, 0.1059],
          ...,
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059],
          [0.0314, 0.0314, 0.0314,  ..., 0.1059, 0.1059, 0.1059]]]),
 {'boxes': tensor([[315.0003, 213.5002, 626.0004, 329.5002]]),
  'labels': tensor([0]),
  'image_id': tensor([1]),
  'area': tensor([36503.9961]),
  'iscrowd': tensor([0])})

I am not sure about iscrowd, but in the tutorial, it was set to zero.

Thanks.

fmassa commented 4 years ago

@FrancescoSaverioZuppichini I think I see the issue: the label for your object is 0, but Faster R-CNN considers value 0 as background. If you make the label be 1, it should work fine.

This is illustrated in the detection tutorial you mentioned, see the dataset line:

# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)

But I agree it can be a bit tricky to spot this. I would happily accept a PR improving the documentation mentioning that the labels should start at 1 and that 0 is treated as background.

FrancescoSaverioZuppichini commented 4 years ago

@fmassa Thank you, it works! 🥳🥳

I will definitely create a PR and improve the doc over the weekend

fmassa commented 4 years ago

Cool, looking forward to the PR improving the documentation!

FrancescoSaverioZuppichini commented 4 years ago

Hi @fmassa, I hope you are healthy. Sorry for the late reply but I have been very busy these days. Is there a doc contribution guide that I can follow to be sure I am changing the right files?

fmassa commented 4 years ago

Hi @FrancescoSaverioZuppichini

All good here, hope everything is good for you as well.

You could maybe add some information in https://github.com/pytorch/vision/blob/master/docs/source/models.rst#object-detection-instance-segmentation-and-person-keypoint-detection or in the tutorials, which are hosted in https://github.com/pytorch/tutorials/blob/master/intermediate_source/torchvision_tutorial.rst

FrancescoSaverioZuppichini commented 4 years ago

Hi @fmassa, I hope you are doing well. I have added a couple of sentences and hopefully, it is more understandable now

You can find the PR here https://github.com/pytorch/tutorials/pull/914

fmassa commented 4 years ago

Thanks for the PR @FrancescoSaverioZuppichini !

MALLI7622 commented 3 years ago

Hi @FrancescoSaverioZuppichini @fmassa . I am also getting no predictions for faster-rcnn model. How did you resolve that problem, It was just changing by label index from 1 instead of 0.

FrancescoSaverioZuppichini commented 3 years ago

By reading the above messages

casper-hansen commented 3 years ago

There is still an error in the documentation. If you have 3 classes in your dataset, and you have no background class in your dataset, you have to specify that num_classes=4 instead of num_classes=3. So, your labels would only contain 1, 2, and 3. However, you need to indicate that there is a non-existent class 0 by specifying the number of classes is equal to four.

If you don't, you will trigger an error: RuntimeError: CUDA error: device-side assert triggered

jaaabir commented 2 years ago

Hi, can someone help me with this too, I am trying object detection using faster rcnn and I used pre-trained model to fine tune it for my custom dataset. I've correctly labeled everything, eg : [1,2,3] for 3 classes + background which is 0, whenever i log the summed losses, they are always above 1e+25, and even when i use model.eval get the detection on the test set , i get no output other than this,

[{'boxes': tensor([], device='cuda:0', size=(0, 4)), 'labels': tensor([], device='cuda:0', dtype=torch.int64), 'scores': tensor([], device='cuda:0')}]

CheremGS commented 1 year ago

Hi, @jaaabir, probably your learning_rate parameter is too high. I solve that problem decrease lr in my optimizer