How to effectively use mmpose for computing losses?

a656418zz commented 1 year ago

📚 The doc issue

Hello, thank you very much for your contribution! I have a project that requires two functionalities of mmpose: predicting model outputs and ground-truth keypoints of human faces, and then using AdaptiveWingLoss() to compute losses for my neural network model. As I am new to keypoint-related tasks, I have some conceptual questions.

Here is a question I encountered:

When testing with /mmpose/tests/data/lapa/13609937564_5.jpg, I found that the model predicted three sets of keypoints, and the number of keypoints is equal for each set. However, when saving the image, two faces were drawn with keypoints, and the number of keypoints on each face is different. There may be a filtering mechanism in the code, but due to my poor coding ability, I have not found it. I only want to save the data with the most recognizable keypoints (i.e., the main face). How can I achieve this?
If I increase the batch_size, how can I get the corresponding keypoints for each batch at the same time?

Thank you very much for your help.

Here is my test code: from mmpose.apis import MMPoseInferencer import torch batch_size = 1 img_path = '../mmpose/tests/data/lapa/13609937564_5.jpg' inferencer = MMPoseInferencer(pose2d='face',device="cpu") result_generator = inferencer(img_path,show=False) result = next(result_generator) result1 = torch.tensor(result["predictions"][0][0]["keypoints"])

Suggest a potential alternative/fix

No response

Ben-Louis commented 1 year ago

Hi, thanks for using MMPose. You can simply use result["predictions"][0][0], which is the prediction with the highest bbox score, as the result for the most significant face. At the moment, Inferencer only allows for inference with a batch size of 1.

a656418zz commented 1 year ago

Hi, thanks for using MMPose. You can simply use result["predictions"][0][0], which is the prediction with the highest bbox score, as the result for the most significant face. At the moment, Inferencer only allows for inference with a batch size of 1.

Thank you very much! May I ask another question? I found that I got the highest bbox score using this method, which saved keypoints [98,2] and keypoint_score[98]. I understand that this corresponds to the coordinates of the keypoints in the image and their confidence scores. However, the input of AdaptWingLoss is ‘torch.Tensor[NxKxHxW]’, which seems to be incompatible in shape. How can I use the keypoints obtained above to calculate the loss?

Ben-Louis commented 1 year ago

The heatmaps in shape [N, K, H, W] are generated in codec https://github.com/open-mmlab/mmpose/blob/537bd8e543ab463fb55120d5caaa1ae22d6aaf06/mmpose/codecs/msra_heatmap.py#L107-L111

The model will generate such a heatmap first, and then the heatmap will be decoded into coordinates using the process described in https://github.com/open-mmlab/mmpose/blob/537bd8e543ab463fb55120d5caaa1ae22d6aaf06/mmpose/codecs/msra_heatmap.py#L117-L150

a656418zz commented 1 year ago

The heatmaps in shape [N, K, H, W] are generated in codec

https://github.com/open-mmlab/mmpose/blob/537bd8e543ab463fb55120d5caaa1ae22d6aaf06/mmpose/codecs/msra_heatmap.py#L107-L111

The model will generate such a heatmap first, and then the heatmap will be decoded into coordinates using the process described in

https://github.com/open-mmlab/mmpose/blob/537bd8e543ab463fb55120d5caaa1ae22d6aaf06/mmpose/codecs/msra_heatmap.py#L117-L150

Thank you very much for your guidance! I have additionally learned other things about it and have a better understanding of the process. In the end this is considered a successful application yes? Here is my code:

from mmpose.apis import MMPoseInferencer
import torch
import mmcv
import numpy as np
from mmpose.codecs import MSRAHeatmap
def getheatmap(img_path):
    img = mmcv.imread(img_path)
    h,w,_ = img.shape

    result_generator = inferencer(img_path,show=False,draw_heatmap=True)
    result = next(result_generator)

    keypoints_visible = np.array(result["predictions"][0][0]["keypoint_scores"])
    keypoints_visible = keypoints_visible[np.newaxis,:]
    keypoints = np.array(result["predictions"][0][0]["keypoints"])
    keypoints = keypoints[np.newaxis,:]

    heatmap_gen = MSRAHeatmap(input_size=(h,w),heatmap_size=(h,w),sigma=2)
    heatmap = heatmap_gen.encode(keypoints,keypoints_visible)
    return heatmap['heatmaps']

inferencer = MMPoseInferencer(pose2d='face',device="cpu")

img_path = '../mmpose/tests/data/lapa/13609937564_5.jpg'
result1 = torch.tensor(getheatmap(img_path)[np.newaxis,:])

img_path = '../mmpose/tests/data/lapa/13609937564_5.jpg'
result2 = torch.tensor(getheatmap(img_path)[np.newaxis,:])

from mmpose.models.losses import AdaptiveWingLoss
criterion = AdaptiveWingLoss()
loss = criterion(result1,result2)
print(loss)

Ben-Louis commented 1 year ago

In my opinion, the code snippet seems to be functioning properly. However, it is not common to use the codec in this manner. Generally, the target heatmap is produced by using ground truth keypoints, rather than predicted keypoints. Additionally, the model generates the predicted heatmap, not the codec.

open-mmlab / mmpose

How to effectively use mmpose for computing losses? #2657

📚 The doc issue

Suggest a potential alternative/fix