● Recent works which leverage the large-scale image-text pairs pre-training such as CLIP shows promising performance in classification, segmentation and depth estimation.
● How to transfer the pretraining knowledge for 3D understanding such as referring point cloud segmentation has been barely explored .
● Recent works which leverage the large-scale image-text pairs pre-training such as CLIP shows promising performance in classification, segmentation and depth estimation. ● How to transfer the pretraining knowledge for 3D understanding such as referring point cloud segmentation has been barely explored .
CLIP: https://arxiv.org/abs/2104.04687 https://www.youtube.com/watch?v=OZF1t_Hieq8
DenseCLIP https://arxiv.org/abs/2112.01518