Finetuning Object Detection Faster R-CNN model with num_classes=1 gets me 0 predictions.

I am trying to finetune single class object detection model without segmentation (just predict boxes/scores). I was following this tutorial.

So, I create model like this:

    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    num_classes = 1
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

Then, when I test on some random image:

model.eval()
with torch.no_grad():
    img = cv2.imread("asd.jpg")
    img_pil = Image.open("asd.jpg").convert("RGB")
    out = model([F.to_tensor(img_pil).to(device)])[0]
    boxes, scores = out["boxes"], out["scores"]
    print(boxes)
    print(scores)

I get following output:

tensor([], device='cuda:0', size=(0, 4))
tensor([], device='cuda:0')

What should I do? Should I proceed with num_classes=2? If yes, then how to write custom dataset, more specifically, should I set target["label"] to 0 or 1? (Actually tested everything that i proposed, however It didn't actually train (poor performance))

pytorch / vision

Finetuning Object Detection Faster R-CNN model with num_classes=1 gets me 0 predictions. #1360