:tada::tada::tada: This is a PyTorch implementation of MCLN proposed by our paper ["Multi-branch Collaborative Learning Network for 3D Visual Grounding"].(ECCV2024)
conda create -n mcln python=3.7
conda activate mcln
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install numpy ipython psutil traitlets transformers termcolor ipdb scipy tensorboardX h5py wandb plyfile tabulate einops
pip install spacy
# 3.3.0
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0.tar.gz
cd ~/MCLN
sh init.sh
We showing visualization via wandb for superpoints, kps points, bad case analyse, predict/ground_truth masks and box.
self.visualization_superpoint = False
self.visualization_pred = False
self.visualization_gt = False
self.bad_case_visualization = False
self.kps_points_visualization = False
self.bad_case_threshold = 0.15
The final required files are as follows:
├── [DATA_ROOT]
│ ├── [1] train_v3scans.pkl # Packaged ScanNet training set
│ ├── [2] val_v3scans.pkl # Packaged ScanNet validation set
│ ├── [3] ScanRefer/ # ScanRefer utterance data
│ │ │ ├── ScanRefer_filtered_train.json
│ │ │ ├── ScanRefer_filtered_val.json
│ │ │ └── ...
│ ├── [4] ReferIt3D/ # NR3D/SR3D utterance data
│ │ │ ├── nr3d.csv
│ │ │ ├── sr3d.csv
│ │ │ └── ...
│ ├── [5] group_free_pred_bboxes/ # detected boxes (optional)
│ ├── [6] gf_detector_l6o256.pth # pointnet++ checkpoint (optional)
│ ├── [7] roberta-base/ # roberta pretrained language model
│ ├── [8] checkpoints/ # mcln pretrained models
download-scannet.py
. Then use the following command to download the necessary files:
python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.ply
python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.labels.ply
python2 download-scannet.py -o [SCANNET_PATH] --type .aggregation.json
python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.0.010000.segs.json
python2 download-scannet.py -o [SCANNET_PATH] --type .txt
where [SCANNET_PATH]
is the output folder. The scannet dataset structure should look like below:
├── [SCANNET_PATH]
│ ├── scans
│ │ ├── scene0000_00
│ │ │ ├── scene0000_00.txt
│ │ │ ├── scene0000_00.aggregation.json
│ │ │ ├── scene0000_00_vh_clean_2.ply
│ │ │ ├── scene0000_00_vh_clean_2.labels.ply
│ │ │ ├── scene0000_00_vh_clean_2.0.010000.segs.json
│ │ ├── scene.......
train_v3scans.pkl
and val_v3scans.pkl
):
python Pack_scan_files.py --scannet_data [SCANNET_PATH] --data_root [DATA_ROOT]
[DATA_ROOT]
.[DATA_ROOT]
.[DATA_ROOT]
. (not used in single-stage method)[DATA_ROOT]
.cd [DATA_ROOT]
git clone https://huggingface.co/roberta-base
cd roberta-base
rm -rf pytorch_model.bin
wget https://huggingface.co/roberta-base/resolve/main/pytorch_model.bin
ScanNetv2
├── data
│ ├── scannetv2
│ │ ├── scans
│ │ ├── scans_test
│ │ ├── train
│ │ ├── val
│ │ ├── test
│ │ ├── val_gt
cd [DATA_ROOT]
python superpoint_maker.py # modify data_root & split
Dataset/Model | REC mAP@0.25 | REC mAP@0.5 | RES mIoU | Model |
---|---|---|---|---|
ScanRefer/mcln | 57.17 | 45.53 | 44.72 | GoogleDrive |
--data_root
, --log_dir
, --pp_checkpoint
in the train_*.sh
script first.sh scripts/train_scanrefer_mcln_sp.sh
sh scripts/train_scanrefer_mcln_sp_single.sh
sh scripts/train_sr3d_mcln_sp.sh
sh scripts/train_nr3d_mcln_sp.sh
--data_root
, --log_dir
, --checkpoint_path
in the test_*.sh
script first.sh scripts/test_scanrefer_mcln_sp.sh
sh scripts/test_scanrefer_mcln_sp_single.sh
sh scripts/test_sr3d_mcln_sp.sh
sh scripts/test_nr3d_mcln_sp.sh
This repository is built on reusing codes of EDA and 3DRefTR. We recommend using their code repository in your research and reading the related article. We are also quite grateful for SPFormer, BUTD-DETR, GroupFree, ScanRefer, and SceneGraphParser.
If you find our work useful in your research, please consider citing:
@misc{qian2024multibranchcollaborativelearningnetwork,
title={Multi-branch Collaborative Learning Network for 3D Visual Grounding},
author={Zhipeng Qian and Yiwei Ma and Zhekai Lin and Jiayi Ji and Xiawu Zheng and Xiaoshuai Sun and Rongrong Ji},
year={2024},
eprint={2407.05363},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.05363}}