guanfuchen / semseg

常用的语义分割架构结构综述以及代码复现 华为媒体研究院 图文Caption、OCR识别、图视文多模态理解与生成相关方向工作或实习欢迎咨询 15757172165 https://guanfuchen.github.io/media/hw_zhaopin_20220724_tiny.jpg
768 stars 164 forks source link

3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation #37

Open guanfuchen opened 5 years ago

guanfuchen commented 5 years ago

related paper

摘要
We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-D scans in indoor environments using a joint 3D-multi-view prediction network. In contrast to existing methods that either use geometry or RGB data as input for this task, we combine both data modalities in a joint, end-to-end network architecture. Rather than simply projecting color data into a volumetric grid and operating solely in 3D – which would result in insufficient detail – we first extract feature maps from associated RGB images. These features are then mapped into the volumetric feature grid of a 3D network using a differentiable back-projection layer. Since our target is 3D scanning scenarios with possibly many frames, we use a multi-view pooling approach in order to handle a varying number of RGB input views. This learned combination of RGB and geometric features with our joint 2D-3D architecture achieves significantly better results than existing baselines. For instance, our final result on the ScanNet 3D segmentation benchmark [1] increases from 52.8% to 75% accuracy compared to existing volumetric architectures.