Closed zhihou7 closed 2 years ago
Thank you for your interest in our work.
HCMoCo uses RGB, depth and 2d keypoints for pre-train. And we transfer the pre-trained RGB backbone to DensePose prediction and RGB human parsing. We transfer the pre-trained depth backbone to depth human parsing and depth 3d skeleton prediction.
The modalities are the same. But for some down-stream tasks like DensePose estimation and RGB human parsing, we use different datasets (MPII and NTURGBD for pre-train while COCO or Human3.6M for down-stream evaluation) which brings domain gap.
I hope the above explanation clarifies your confusion.
Thanks for your reply. I get it.
Hi, thanks for your interesting work. I am confused about the setting. Does the setting use the same training data with HCMoCo for down-stream tasks? I mean are there any difference between the pre-training modalities and down-stream modality? Maybe, I miss something in the paper. But I do not find an apparent introduction. I might not understand this description well.