open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.32k stars 747 forks source link

GRAD CAM For OCR[Feature] #1482

Closed bely66 closed 1 year ago

bely66 commented 1 year ago

What is the feature?

After training the model it would be amazing to have the ability to visualize why the model made such predictions. This helps to understand:

  1. Any overfitting that's happening
  2. What're the shortcomings you need to address in the dataset
  3. Identify specific cases where your model performs logically

Any other context?

No response

bely66 commented 1 year ago

I'm still exploring this but is there a possibility to use the PyTorch grad cam directly with your models at this stage?

Harold-lkk commented 1 year ago

We have implemented many visualization tools for visualizing the datasets and prediction in dev-1.x. You can refer to https://mmocr.readthedocs.io/en/dev-1.x/user_guides/visualization.html and https://mmocr.readthedocs.io/en/dev-1.x/user_guides/useful_tools.html for more detail. For feature visualization, we are still in progress. Temporarily, you can visualize the feature with Visualizer or CAM by hardcode at the feature generated.

bely66 commented 1 year ago

We have implemented many visualization tools for visualizing the datasets and prediction in dev-1.x.

You can refer to https://mmocr.readthedocs.io/en/dev-1.x/user_guides/visualization.html and https://mmocr.readthedocs.io/en/dev-1.x/user_guides/useful_tools.html for more detail.

For feature visualization, we are still in progress. Temporarily, you can visualize the feature with Visualizer or CAM by hardcode at the feature generated.

Thanks for your reply I'd like to know more about mmengine visualizer to visualize network layers I can't find any code examples Can you direct me to something useful?

Harold-lkk commented 1 year ago

Temporarily, we only have Chinese document here

The API for visualizing the feature map is draw_featmap The example code is:

def preprocess_image(img, mean, std):
    preprocessing = Compose([
        ToTensor(),
        Normalize(mean=mean, std=std)
    ])
    return preprocessing(img.copy()).unsqueeze(0)

model = resnet18(pretrained=True)

def _forward(x):
    x = model.conv1(x)
    x = model.bn1(x)
    x = model.relu(x)
    x = model.maxpool(x)

    x1 = model.layer1(x)
    x2 = model.layer2(x1)
    x3 = model.layer3(x2)
    x4 = model.layer4(x3)
    return x4

model.forward = _forward

image_norm = np.float32(image) / 255
input_tensor = preprocess_image(image_norm,
                                mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225])
feat = model(input_tensor)[0]

visualizer = Visualizer()
drawn_img = visualizer.draw_featmap(feat, channel_reduction='select_max')
visualizer.show(drawn_img)
bely66 commented 1 year ago

Thanks I'll try it and get back to you