About pretrained model and focal loss

BobDLA commented 4 years ago

Hi,

Could you help explain the method of the 3 pretrained model? How does they relate to the method in the paper?
Could you introduce more info on the Focal loss function? Seems the latest code in loss_model.py is different from the formula in the paper(a=b=0, abs(1. - st) instead (1. - st) ** gamma). And the loss in loss_model_parrel.py also different from the loss_model.py . Which is better?
Seems the model with Focal loss will cause more false positive and error connection, even the AP is more higher. Does that make sense?
Could you comment the Focal loss used on the Cornernet on heatmap? Will that work on this model also?

Thanks

hellojialee commented 4 years ago

Hi, your questions are worthy of attention and may interest others, so I will give detailed replies as follows:

If you do not care the very details, do just with the default configurations. The default setting uses a 4-stage IMHN with the residual block used in Hourglass network rather than the 3×3 convolutional layer used in Associative Embedding. And its pretained weight with optimizer statues is named as "PoseNet_52_epoch.pth" in Baidu cloud drive link. Every config.py corresponds to the same version of posent in the package named "./models". They are copies of part of the ablation entries and the clean repo is not repaired yet. It is difficult and time-consuming to clean up different configuration into a line of code. We can only guarantee the provided pretrained models are the best ones fine-tuned under corresponding configurations by us. And please refer to the instructions in corresponding scripts as well. For now, I do not have enough time to clean and upload the perfect version of our project. The performance of default configuration is nearly the same as the best model reported in our paper. Considering the abundant ablation experiments and rough git history (sorry for that) we suggest to use the default settings. Try to find the information in the code annotations please and refer to the results in the paper.
Using the alpha, beta which are set in the paper when we train the model using train_distributed_SWA.py (after train_distributed.py) brings about 0.3% AP increase. The loss_model_parrel.py is for train.py and train_parallel.py. The loss_model.py is for train_distributed.py and train_distributed_SWA.py. Please refer to the code about the different choices. *For distributed training, the real batch size = batch_size_in_config Num_GPUs. For others, the real batch size = batch_size_in_config.** When you are confused, just use the default setting. We provide four different training options just to facilitate different conditions.
Lower threshold (refer to the "thre1" and "thre2" in /utils/config) in the heatmap leads to more false positives and it is the same when it comes to normal L2 loss. SOTA approaches usually use low threshold to detect as many poses as possible. But please note that keypoint evaluation on COCO only consider the top 20 scoring poses. If the false positives are frequent, the keypoints grouping tends to be fragile and the evaluations metric dose not increase consistently. I believe the proposed Focal L2 loss makes sense while the focal loss used in object detection may be different.
No. They use focal loss to deal with a classification task (0 or 1) and thus they have to infer the offset vectors to the nearest keypionts as well for accurate localization. The focal L2 loss is designed for the Gaussian regression task (0~1) and we do not need the offset feature maps.

hellojialee commented 4 years ago

Hi, I have just noticed your issue about gamma #16 #11

hellojialee / Improved-Body-Parts

About pretrained model and focal loss #7