hongfz16 / HCMoCo

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception
MIT License
118 stars 7 forks source link

Question about the setting #1

Closed zhihou7 closed 2 years ago

zhihou7 commented 2 years ago

Hi, thanks for your interesting work. I am confused about the setting. Does the setting use the same training data with HCMoCo for down-stream tasks? I mean are there any difference between the pre-training modalities and down-stream modality? Maybe, I miss something in the paper. But I do not find an apparent introduction. I might not understand this description well.

To evaluate HCMoCo, we transfer our pre-train model to four human-centric downstream tasks using different modalities,

hongfz16 commented 2 years ago

Thank you for your interest in our work.

HCMoCo uses RGB, depth and 2d keypoints for pre-train. And we transfer the pre-trained RGB backbone to DensePose prediction and RGB human parsing. We transfer the pre-trained depth backbone to depth human parsing and depth 3d skeleton prediction.

The modalities are the same. But for some down-stream tasks like DensePose estimation and RGB human parsing, we use different datasets (MPII and NTURGBD for pre-train while COCO or Human3.6M for down-stream evaluation) which brings domain gap.

I hope the above explanation clarifies your confusion.

zhihou7 commented 2 years ago

Thanks for your reply. I get it.