ruhyadi/yolo3d-lightning

# YOLO3D: 3D Object Detection with YOLO

⚠️ Cautions

This repository currently under development

📼 Demo

![demo](./docs/assets/demo.gif)

📌 Introduction

YOLO3D is inspired by Mousavian et al. in their paper 3D Bounding Box Estimation Using Deep Learning and Geometry. YOLO3D uses a different approach, as the detector uses YOLOv5 which previously used Faster-RCNN, and Regressor uses ResNet18/VGG11 which was previously VGG19.

🚀 Quickstart

YOLO3D use hydra as the config manager; please follow official website or ashleve/lightning-hydra-template.

🍿 Inference

You can use pretrained weight from Release, you can download it using script get_weights.py:

# download pretrained model
python script/get_weights.py \
  --tag v0.1 \
  --dir ./weights

Inference with inference.py:

python inference.py \
  source_dir="./data/demo/videos/2011_09_26/image_02/data" \
  detector.model_path="./weights/detector_yolov5s.pt" \
  regressor_weights="./weights/mobilenetv3-best.pt"

⚔️ Training

There are two models that will be trained here: detector and regressor. For now, the detector model that can be used is only YOLOv5, while the regressor model can use all models supported by Torchvision.

💽 Dataset Preparation

For now, YOLO3D only supports the KITTI dataset. Going forward, we will try to add support to the Lyft and nuScene datasets.

1. Download KITTI Dataset

You can download KITTI dataset from official website. After that, extract dataset to data/KITTI. Since we will be using two models, it is highly recommended to rename images_2 to images.

.
├── data
│   └── KITTI
│       ├── calib
│       ├── images # original images_2
│       └── labels_2

2. Generate YOLO Labels

The kitti label format on labels is different from the format required by the YOLO model. Therefore, we have to create a YOLO format from a KITTI format. The author has provided a script/kitti_to_yolo.py that can be used.

python script/kitti_to_yolo.py \
  --dataset_path ./data/KITTI \
  --classes car, van, truck, pedestrian, cyclist \
  --img_width 1224 \
  --img_height 370

The script will generate a labels folder containing the labels for each image in YOLO format.

.
├── data
│   └── KITTI
│       ├── calib
│       ├── images    # original images_2
|       ├── labels_2  # kitti labels
│       └── labels    # yolo labels

The next thing is to generate a sets of images/labels training and validation, these sets are also used as partitions to divide the dataset. The author has provided a script/generate_sets.py that can be used.

python script/generate_sets.py \
  --images_path ./data/KITTI/images \
  --dump_dir ./data/KITTI \
  --postfix _yolo \
  --train_size 0.8 \
  --is_yolo

🚀 Training Detector Model

Right now author just use YOLOv5 model

For YOLOv5 training on a single GPU, you can use the command below:

cd yolov5
python train.py \
    --data ../configs/detector/yolov5_kitti.yaml \
    --weights yolov5s.pt \
    --img 640

As for training on multiple GPUs, you can use the command below:

cd yolov5
python -m torch.distributed.launch \
    --nproc_per_node 4 train.py \
    --epochs 10 \
    --batch 64 \
    --data ../configs/detector/yolov5_kitti.yaml \
    --weights yolov5s.pt \
    --device 0,1,2,3

🪀 Training Regessor Model

⚠️ Under development

You can use all the models available on Torchvision by adding some configuration to src/models/components/base.py. The current author has provided ResNet18 and VGG11 which can be used directly.

python src/train.py \
  experiment=sample

❤️ Acknowledgement

YOLOv5 by Ultralytics
skhadem/3D-BoundingBox

Mousavian et al.

@misc{mousavian20173d,
  title={3D Bounding Box Estimation Using Deep Learning and Geometry}, 
  author={Arsalan Mousavian and Dragomir Anguelov and John Flynn and Jana Kosecka},
  year={2017},
  eprint={1612.00496},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}