How to prepare target heatmaps?

yangsenius / TransPose

PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.

https://github.com/yangsenius/TransPose/releases/download/paper/transpose.pdf

MIT License

360 stars 58 forks source link

How to prepare target heatmaps? #29

Closed mukeshnarendran7 closed 2 years ago

mukeshnarendran7 commented 2 years ago

I want to use the pre-trained model and fine-tune it for another application but I am not able to find the heatmaps preparation code reference? Is it similar to taking an image an converting the (x,y) co-ordinates to heatmaps like for CNN's pose estimation problem? The model output is of (48,64) but my input images are 256, 192. A reference will be helpful. Thanks

yangsenius commented 2 years ago

Please refer to https://github.com/yangsenius/TransPose/blob/dab9007b6f61c9c8dce04d61669a04922bbcd148/lib/dataset/JointsDataset.py#L239

mukeshnarendran7 commented 2 years ago

Thanks for getting back

So if your input is (256, 192) and output heatmaps are (48,64) how are you transforming them back to key points on (256,192)? I have not seen this in many pose estimation cnns, it would be nice to get some insight on this
I have another query on how the final layer is transformed while testing the mpii dataset. In the paper, it is mentioned that you need to use a d*16 fully connected layer in place of the final layer with a pre-trained network. If you could share a code example would be great because I am a bit confused with the output as they are not heatmaps then.
How would you compare to the target?

yangsenius commented 2 years ago

Hi,

We use this code to transform the coordinate in (48, 64) into the original coordinate frame of (256, 192). This approach unavoidably brings quantization error, so we use DARK based post-processing to reduce such error.
The final layer in the model is a 1x1 conv that convert the channel number from d to keypoint_number. This 1x1 conv equals a linear FC, because it is position-wise linear transformation. So you can also use a 1x1 conv with (d, 16) channels. They have the same effect to output the heatmaps.
Just MSE loss to compute the error with GT heatmaps

mukeshnarendran7 commented 2 years ago

Hi, thanks once again for clarifying the issues. I have some more questions about processing.

Will this approach work too for generating heatmaps. Ex: if I have an image (256, 192) and want to generate 16 heatmaps, then I can just transform the co-ordinates like ((x_n/256)48, (y_n/256)64) (n=0....16) and make a gaussian heatmap from these co-ordinates
When you mention using MSE loss to compute errors like the PyTorch implementation will be just as good enough or do I need to use the JointsMSELoss(nn.Module): you have used in loss.py?

yangsenius commented 2 years ago

Yes, you're right.
JointMSEloss considers more detailed implementation, such as the visibility of the keypoints. Essentially, they are the same loss function. But I suggest to use the JointMSE loss.