yanmin-wu / EDA

[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Other
101 stars 4 forks source link
3d-vision-and-language 3d-visual-grounding vision-and-language visual-grounding

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding (CVPR2023)

By Yanmin Wu, Xinhua Cheng, Renrui Zhang, Zesen Cheng, Jian Zhang*
This repo is the official implementation of "EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding". CVPR2023 | arXiv | Code

Figure 1

0. Installation

1. [TODO] Quick visualization demo

2. Data preparation

The final required files are as follows:

├── [DATA_ROOT]
│   ├── [1] train_v3scans.pkl # Packaged ScanNet training set
│   ├── [2] val_v3scans.pkl   # Packaged ScanNet validation set
│   ├── [3] ScanRefer/        # ScanRefer utterance data
│   │   │   ├── ScanRefer_filtered_train.json
│   │   │   ├── ScanRefer_filtered_val.json
│   │   │   └── ...
│   ├── [4] ReferIt3D/        # NR3D/SR3D utterance data
│   │   │   ├── nr3d.csv
│   │   │   ├── sr3d.csv
│   │   │   └── ...
│   ├── [5] group_free_pred_bboxes/  # detected boxes (optional)
│   ├── [6] gf_detector_l6o256.pth   # pointnet++ checkpoint (optional)
│   ├── [7] roberta-base/     # roberta pretrained language model
│   ├── [8] checkpoints/      # EDA pretrained models

3. Models

Dataset mAP@0.25 mAP@0.5 Model Log (train) Log (test)
ScanRefer 54.59 42.26 OneDrive* 54_59.txt1 / 54_44.txt2 log.txt
ScanRefer (Single-Stage) 53.83 41.70 OneDrive 53_83.txt1 / 53_47.txt2 log.txt
SR3D 68.1 - OneDrive 68_1.txt1 / 67_6.txt2 log.txt
NR3D 52.1 - OneDrive 52_1.txt1 / 54_7.txt2 log.txt

*: This model is also used to evaluate the new task of grounding without object names, with performances of 26.5% and 21.6% for acc@0.25 and acc@0.5.
1: The log of the performance we reported in the paper.
2: The log of the performance we retrain the model with this open-released repository.
Note: To find the overall performance, please refer to issue3.

4. Training

5. Evaluation

6. Acknowledgements

We are quite grateful for BUTD-DETR, GroupFree, ScanRefer, and SceneGraphParser.

7. Citation

If you find our work useful in your research, please consider citing:

@inproceedings{wu2022eda,
  title={EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding},
  author={Wu, Yanmin and Cheng, Xinhua and Zhang, Renrui and Cheng, Zesen and Zhang, Jian},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

8. Contact

If you have any question about this project, please feel free to contact Yanmin Wu: wuyanminmax[AT]gmail.com