This repository provides core support for performing object detection on navigation datasets. Support for 3D object detection and domain adaptation are in experimental phase and will be added later. This project provides support for training, evaluation, inference, visualization.
If you use the code in any way, please consider citing:
@InProceedings{Bhargava_2019_ICCV,
author = {Bhargava, Prajjwal},
title = {On Generalizing Detection Models for Unconstrained Environments},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}
}
This work provides support for the following datasets (related to object detection for autonomous navigation):
Directory structure :
+-- data
| +-- bdd100k
| +-- IDD_Detection
| +-- cityscapes
+-- autonmous-object-detection
.......
cfg.py
By default, all paths and hyperparameters are loaded from cfg.py
. Users are required to specify paths of dataset and hyperparameters once.
This can also be overriden by user
We use something called datalists. Datalists are lists which contains path to images and labels. This is because some of the images don't have proper labels. Datalists ensure that the lists only contain structured usable data (dataloader would work seamlessly). Data cleaning happens in the process.
You need to specify a proper path and ds
variable in the cfg.py
to specify the dataset you want to use.
python3 get_datalists.py
It assumes that datalists have been created. This step ensures that you won't get bad samples while dataloader iterates. Create a dir named data
and put all datasets inside it.
This library uses a common API (similar to torchvision).
All datasets class expect the same inputs:
Input:
idd_image_path_list
idd_anno_path_list
get_transform: A transformation function.
Output:
A dict containing boxes, labels, image_id, area, iscrowd inside a torch.tensor.
dset = IDD(idd_image_path_list,idd_anno_path_list,transforms=None)
dset = BDD(bdd_img_path_list,train_anno_json_path,transforms=None)
BDD100k doesn't provide individual ground truths. A single JSON file is provided. So creating dataset takes a little longer than usual for parsing JSON.
dset = Cityscapes(image_path_list,target_path_list, split='train',transforms=None)
This was tested for Citypersons (GTs for person class). You can extract GTs from segmentation as well, but user would have to manage datalists.
get_transforms(bool:train)
converts images into tensors and applies Random Horizontal flipping on input data.
Any detection model can be used (Yolo,FasterRCNN,SSD). Currently we provide support from torchvision.
from train_baseline import get_model
model = get_model(len(classes)) # Returns a Faster RCNN with Resnet 50 as backbone pretrained on COCO.
Support for baseline has been added. Domain adaptive features will be added later. Users need to specify the path in the script (in user defined settings section) and dataset
$ python train_baseline.py
Evaluation in performed in COCO format. Users need to specify saved model_name
in cfg.py
on which evaluation is supposed to occur.
CocoAPI needs to be compiled. first download it from here
$ cd cocoapi/PythonAPI
$ python setup.py build_ext install
Now evaluation can be performed.
$ python3 evaluation_baseline.py
Pretrained Models for IDD and BDD100k are available here. For BDD100k, you can straightaway use the model. This model was used to perform incremental learning as mentioned in the paper on IDD. As a result, the base network (model for BDD100k) was reused with new task specific layers to train on IDD.
Please refer to exp
directory, jupyter notebooks are self explanatory. Here are the results from the paper.
S and T | Epoch | Active Components (with LR) | LR Range | map (%) at specified epochs |
---|---|---|---|---|
BDD -> IDD IDD -> BDD |
5 Eval |
+ROI Head(1e-3) | 1e-3, 6e-3 - |
24.3 45.7 |
BDD -> IDD IDD -> BDD |
5,9 Eval |
+RPN (1e-4) +ROI head (1e-3) |
1e-4, 6e-4 - |
24.7, 24.9 45.3, 45.0 |
BDD -> IDD IDD -> BDD |
1,5,6,7 Eval |
+RPN (1e-4)+ROI head (1e-3) |
1e-4, 6e-3 - |
24.3, 24.9, 24.9, 25.0 45.7, 44.8, 44.7, 44.7 |
BDD -> IDD IDD -> BDD |
1,5,10 Eval |
+ROI head (1e-3) +RPN (4e-4) +FPN(2e-4) |
1e-4, 6e-3 - |
24.9, 25.4, 25.9 45.2, 43.9, 43.3 |
Refer to inference.ipynb
for plotting images with model's predictions.
By default, tensorboard will start logging loss
and learning_rate
in engine.py
. You can start by using
$ tensorboard /path/ --port=8888