This is a PyTorch implementation of the NeurIPS-22 paper: https://arxiv.org/abs/2206.12403
Arjun Majumdar, Gunjan Aggarwal, Bhavika Devnani, Judy Hoffman and Dhruv Batra
Georgia Institute of Technology, Meta AI
We present a scalable approach for learning open-world object-goal navigation (ObjectNav) – the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e.g., “find a sink”). Our approach is entirely zero-shot – i.e., it does not require ObjectNav rewards or demonstrations of any kind.
Model Architecture for ZSON.
All the required data can be downloaded from here.
Create a conda environment:
conda create -n zson python=3.7 cmake=3.14.0
conda activate zson
Install pytorch version 1.10.2
:
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=11.3 -c pytorch -c conda-forge
Install habitat-sim:
conda install habitat-sim-challenge-2022 headless -c conda-forge -c aihabitat
Install habitat-lab:
git clone --branch challenge-2022 https://github.com/facebookresearch/habitat-lab.git habitat-lab-challenge-2022
cd habitat-lab-challenge-2022
pip install -r requirements.txt
python setup.py develop --all # install habitat and habitat_baselines
cd ..
Setup steps
git clone git@github.com:gunagg/zson.git
cd zson
pip install -r requirements.txt
python setup.py develop
Follow the instructions here to set up the data/scene_datasets/
directory. gibson
scenes can be found here.
Download the HM3D ImageNav training dataset:
wget https://huggingface.co/gunjan050/ZSON/resolve/main/imagenav_hm3d.zip
unzip imagenav_hm3d.zip
rm imagenav_hm3d.zip # clean-up
Download the MP3D objectnav dataset.
wget https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/m3d/v1/objectnav_mp3d_v1.zip
mkdir -p data/datasets/objectnav/mp3d/v1
unzip objectnav_mp3d_v1.zip -d data/datasets/objectnav/mp3d/v1
rm objectnav_mp3d_v1.zip # clean-up
Download the HM3D objectnav dataset.
wget https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/hm3d/v1/objectnav_hm3d_v1.zip
unzip objectnav_hm3d_v1.zip -d data/datasets/objectnav/
rm objectnav_hm3d_v1.zip # clean-up
Download the trained checkpoints zson_conf_A.pth and zson_conf_B.pth, and move to data/checkpoints
.
To train policies using OVRL pretrained RGB encoder, download the model weights from here and move to data/models/
.
More details on the encoder can be found here.
Setup data/goal_datasets
using the script tools/extract-goal-features.py
. This caches CLIP goal embeddings for faster training.
Your directory structure should now look like this:
.
+-- habitat-lab-v0.2.1/
| ...
+-- zson/
| +-- data/
| | +-- datasets/
| | | +-- objectnav/
| | | +-- imagenav/
| | +-- scene_datasets/
| | | +-- hm3d/
| | | +-- mp3d/
| | +-- goal_datasets/
| | | +-- imagenav/
| | | | +-- hm3d/
| | +-- models/
| | +-- checkpoints/
| +-- zson/
| ...
sbatch scripts/imagenav-v1-hm3d-ovrl-rn50.sh
sbatch scripts/imagenav-v2-hm3d-ovrl-rn50.sh
To evaluate a checkpoint trained using ZSON checkpoint use the following command:
sbatch scripts/objnav-eval-$DESIRED-CONFIGURATION$-$DATASET$.sh
If you use this code in your research, please consider citing:
@inproceedings{majumdar2022zson,
title={ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings},
author={Majumdar, Arjun and Aggarwal, Gunjan and Devnani, Bhavika and Hoffman, Judy and Batra, Dhruv},
booktitle={Neural Information Processing Systems (NeurIPS)},
year={2022}
}