hellojialee / Improved-Body-Parts

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
https://arxiv.org/abs/1911.10529
258 stars 42 forks source link

About focal l2 loss #11

Closed JyunYuLai closed 4 years ago

JyunYuLai commented 4 years ago

Hi,

Thank you for sharing this great work. I have implemented focal l2 loss but unfortunately didn't get better results compared to normal l2 loss. Here are some questions about focal l2 loss.

  1. In the paper, you mentioned that before apply focal l2 loss, first train the network with normal l2 loss. Does that mean you should have two stage training, the first stage would be l2 loss training and then use the best checkpoint from first stage as initial weight and train the network with focal l2 loss? Or you can just directly train focal l2 loss without any pretrained stage?
  2. Is focal l2 loss sensitive to hyper-parameter? I adopt nearly the same hyper-parameters as your implementation. I guess maybe this is the reason why I didn't get better results?. I'm looking forward to your suggestion. Thank you in advance.
hellojialee commented 4 years ago

Hi, @JyunYuLai, thank you for your interest.

Answer1: I continue to train the network with focal l2 loss is just to save time, because I want to compare the focal l2 loss with the normal l2 loss. In my situation, it is OK to train the network with focal l2 loss from scratch if the training converges, thus I use warm-up learning rate at first (gaussian weight initialization leads the initial heatmap values equaling 0, i.e. regarding the whole image as background and thus the focal team squashes the loss of background significantly). Recently, I retrain my system with HRNet backbone (pertained) and the focal l2 loss still brings about 3% AP increase (multi-scale testing is used) compared with l2 loss. Keep it in mind that the focal l2 loss is applied simultaneously to the body part and keypoint heatmap in our work. As mentioned in our paper: we recommend to data mining to keypoint and keypoint connection.

Answer2: It could be sensitive to the hyper-parameter thre (0.01 in our case). Please insure that the sigma and the area of gaussian peak are proper w.r.t the size of the heatmap for loss balance.