YOLOv5 and ScoreCAM - Githubissues

Harry-Rogers commented 2 years ago

Hi I have pretrained a YOLOv5 model on a custom dataset and have tried to use the tutorial code to use ScoreCAM but seem to get the below error.

ValueError: only one element tensors can be converted to Python scalars

Which points to line 59 in score_cam.py (below).

outputs = [target(o).cpu().item() for o in self.model(batch)]

I'm unsure of how to fix this as the batch is a tensor that is the same shape as my other implementation using ScoreCAM with a Faster RCNN network.

Any help would be greatly appreciated.

jacobgil commented 2 years ago

Hi, Need more details. What is the target you're using ? Is it exactly like FasterRCNNBoxScoreTarget from the notebook, or something else ?

Harry-Rogers commented 2 years ago

I'm using the Yolov5 model so just I'm just using the below code from the tutorial. I managed to get ScoreCAM working for a Faster RCNN with the same dataset so I don't think its that.

target_layers = [model.model.model.model[-2]]

jacobgil commented 2 years ago

Thanks, sorry for the delay in the response. Are you using FasterRCNNBoxScoreTarget as the target (not target_layers)? I suspect there is a problem there, so that's why I'm asking. In case you modified the target (the function that outputs a score), can you please paste the code here?

Harry-Rogers commented 2 years ago

Hi, I have been using a YOLOv5s model, I have adapted the YOLOv5 notebook below. I still get the same error mentioned above.

import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
import torch    
import cv2
import numpy as np
import requests
import torchvision.transforms as transforms
from pytorch_grad_cam import ScoreCAM
from pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image
from PIL import Image

COLORS = np.random.uniform(0, 255, size=(80, 3))

def parse_detections(results):
    detections = results.pandas().xyxy[0]
    detections = detections.to_dict()
    boxes, colors, names = [], [], []

    for i in range(len(detections["xmin"])):
        confidence = detections["confidence"][i]
        if confidence < 0.2:
            continue
        xmin = int(detections["xmin"][i])
        ymin = int(detections["ymin"][i])
        xmax = int(detections["xmax"][i])
        ymax = int(detections["ymax"][i])
        name = detections["name"][i]
        category = int(detections["class"][i])
        color = COLORS[category]

        boxes.append((xmin, ymin, xmax, ymax))
        colors.append(color)
        names.append(name)
    return boxes, colors, names

def draw_detections(boxes, colors, names, img):
    for box, color, name in zip(boxes, colors, names):
        xmin, ymin, xmax, ymax = box
        cv2.rectangle(
            img,
            (xmin, ymin),
            (xmax, ymax),
            color, 
            2)

        cv2.putText(img, name, (xmin, ymin - 5),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.8, color, 2,
                    lineType=cv2.LINE_AA)
    return img

image_url = "https://upload.wikimedia.org/wikipedia/commons/f/f1/Puppies_%284984818141%29.jpg"
img = np.array(Image.open("Puppies_(4984818141).jpg"))
img = cv2.resize(img, (640, 640))
rgb_img = img.copy()
img = np.float32(img) / 255
transform = transforms.ToTensor()
tensor = transform(img).unsqueeze(0)

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model.eval()
model.cpu()
target_layers = [model.model.model.model[-2]]

results = model([rgb_img])
boxes, colors, names = parse_detections(results)
detections = draw_detections(boxes, colors, names, rgb_img.copy())
Image.fromarray(detections)

cam = ScoreCAM(model, target_layers, use_cuda=False)
grayscale_cam = cam(tensor)[0, :, :]
cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)
Image.fromarray(cam_image)

jacobgil commented 2 years ago

Oh ok, now I got it.

The example in the YOLO notebook uses EigenCAM, it's a method that doesn't require a "target". The target is what guides the models selecting which channels are important by a score. In the FasterRCNN notebook there a target function, for AblationCAM, that checks how the predicted box in the modified image overlap in IOU/category with the original boxes. EigenCAM doesn't need this, but the ScoreCAM method does.

So will need to rewrite FasterRCNNBoxScoreTarget for YOLO (since the model outputs the boxes in a different format).

jacobgil commented 2 years ago

I can try doing that

Harry-Rogers commented 2 years ago

Oh ok thank you for clearing that up.

If that's possible that would be great.

beneon commented 2 years ago

First I want to thank jacobgil for your brilliant works, especially for those tutorials, they are very helpful, even more useful than tutorials provided in captum.ai.

Anyway, I've been trying to get pytorch-grad-cam to output cam image for specific labels and wrote ScoreTarget class for yolo. I try to get ablationCam working for yolov5, but after some tinkering, things got stuck.

My understanding is that AblationCam replace the target layer I provided (like target_layers = [model.model.model.model[-2]]) with the albation layer. but after this, yolo v5 reported this error:

AttributeError: 'AblationLayerYolo' object has no attribute 'f'

So my question is do I need to implement this f thing myself? cause from what I saw, ablation layer should have .set_next_batch and call, and this f thing seem to be something native to yolov5, but since the layer replacement occurs, I also need to address it.

By the way, maybe score-cam can be adopted for yolo-v5 more easily? cause from what I see there is no layer replacement there.

noreenanwar commented 2 years ago

I can try doing that

did u able to implement that?

bryanbocao commented 1 year ago

Similar error here!

jacobgil / pytorch-grad-cam

YOLOv5 and ScoreCAM #242