This is the official website for "TJU-DHD: A Diverse High-Resolution Dataset for Object Detection (TIP2020)", which is a newly built high-resolution dataset for object detection and pedestrian detection.
Vehicles, pedestrians, and riders are the most important and interesting objects in the perception modules of self-driving vehicles and video surveillance. However, the state-of-the-art performance of detecting such important objects (esp. small objects) is far from satisfying the demand of the practical systems. Large-scale, rich-diversity, and high-resolution vehicle and pedestrian datasets play an important role in developing better object detection methods to satisfy the demand. Existing public large-scale datasets such as MS COCO collected from websites do not focus on these specific scenarios. Moreover, the popular datasets (e.g., KITTI and Citypersons) collected from these specific scenarios are limited in the number of images and instances, the resolution, and the diversity in seasons, weathers, and illuminations. To attempt to solve the problem, in this paper, we build a diverse high-resolution dataset (called TJU-DHD). The dataset contains 115,354 high-resolution images (52% images have a resolution of 1624x1200 pixels and 48% images have a resolution of at least 2,560x1,440 pixels) and 709,330 labeled objects in total with a large variance in scale and appearance. Meanwhile, the dataset has a rich diversity in season variance, illumination variance, and weather variance. Based on this object dataset, a new diverse pedestrian dataset is further built. With the four different detectors (i.e., the one-stage RetinaNet, anchor-free FCOS, two-stage FPN, and Cascade R-CNN), experiments about object detection and pedestrian detection are conducted. We hope that the newly built dataset can help promote the research on object detection and pedestrian detection in these two scenes.
name | DHD-traffic (#images) | DHD-traffic (#instances) | DHD-campus (#images) | DHD-campus (#instances) |
---|---|---|---|---|
training | 45,266 | 239,980 | 39,727 | 267,445 |
validation | 5,000 | 30,679 | 5,204 | 41,620 |
test | 10,000 | 60,963 | 10,157 | 68,643 |
total | 60,266 | 331,622 | 55,088 | 377,708 |
The training imageset is too large, thus is ziped as a 4-part archive. After downloading all four parts, you can open the
.zip.001
using your favorite zip file extractor. On Linux, the multi-part archive can be also unzipped bycat dhd_campus_train_images.zip.* > dhd_campus_train_images.zip unzip dhd_campus_train_images.zip -d /path/to/your/folder
name | Ped-traffic (#images) | Ped-traffic (#instances) | Ped-campus (#images) | Ped-campus (#instances) |
---|---|---|---|---|
training | 13,858 | 27,650 | 39,727 | 234,455 |
validation | 2,136 | 5,244 | 5,204 | 36,161 |
test | 4,344 | 10,724 | 10,157 | 59,007 |
total | 20,338 | 43,618 | 55,088 | 329,623 |
(Note that the images are same as those in the TJU-DHD-traffic)
(Note that the images are same as those in the TJU-DHD-campus)
Results on validation
method | backbone | input size | AP | AP@0.5 | AP@0.75 | AP_s | AP_m | AP_l |
---|---|---|---|---|---|---|---|---|
RetinaNet | ResNet50 | 1333x800 | 53.5 | 80.9 | 60.0 | 24.0 | 50.5 | 68.0 |
FCOS | ResNet50 | 1333x800 | 53.8 | 80.0 | 60.1 | 24.6 | 50.6 | 68.8 |
FPN | ResNet50 | 1333x800 | 55.4 | 83.4 | 63.0 | 30.4 | 52.2 | 68.2 |
Cascade RCNN | ResNet50 | 1333x800 | 57.9 | 82.7 | 66.6 | 32.6 | 54.4 | 71.4 |
Results on validation
method | backbone | input size | AP | AP@0.5 | AP@0.75 | AP_t | AP_s | AP_l | AP_l |
---|---|---|---|---|---|---|---|---|---|
RetinaNet | ResNet50 | 1333x800 | 48.4 | 79.3 | 52.4 | 4.7 | 27.3 | 56.2 | 73.8 |
FCOS | ResNet50 | 1333x800 | 49.3 | 73.8 | 53.8 | 5.6 | 29.6 | 55.9 | 74.3 |
FPN | ResNet50 | 1333x800 | 52.4 | 77.5 | 58.4 | 8.5 | 37.4 | 58.6 | 74.9 |
Cascade RCNN | ResNet50 | 1333x800 | 55.1 | 77.6 | 60.9 | 10.8 | 40.1 | 61.2 | 78.8 |
Method | publication | R | RS | HO | R+HO | A | link |
---|---|---|---|---|---|---|---|
RetinaNet | ICCV2017 | 34.73 | 82.99 | 71.31 | 42.26 | 44.34 | Paper |
FCOS | ICCV2019 | 31.89 | 69.04 | 81.28 | 39.38 | 41.62 | Paper |
FPN | ICCV2017 | 27.92 | 67.52 | 73.14 | 35.67 | 38.08 | Paper |
CrowdDet | CVPR2020 | 25.73 | - | 66.38 | 33.63 | 35.90 | Paper |
EGCL | IEEE TIP2023 | 24.84 | - | 65.27 | 32.39 | 34.87 | Paper |
DeFCN | CVPR2021 | 32.1 | 62.7 | 72.7 | 39.9 | 42.1 | Paper |
OPL | CVPR2023 | 31.5 | 61.7 | 72.4 | 39.3 | 41.5 | Paper |
MTOM | WACV2023 | 21.8 | 37.04 | 57.08 | - | - | Paper |
Method | publication | R | RS | HO | R+HO | A | link |
---|---|---|---|---|---|---|---|
RetinaNet | ICCV2017 | 23.89 | 37.92 | 61.60 | 28.45 | 41.40 | Paper |
FCOS | ICCV2019 | 24.35 | 37.40 | 63.73 | 28.86 | 40.02 | Paper |
FPN | ICCV2017 | 22.30 | 35.19 | 60.30 | 26.71 | 37.78 | Paper |
CrowdDet | CVPR2020 | 20.82 | - | 61.22 | 25.28 | 36.94 | Paper |
EGCL | IEEE TIP2023 | 19.73 | - | 60.05 | 24.19 | 35.76 | Paper |
DeFCN | CVPR2021 | 24.2 | 29.1 | 62.8 | 29.0 | 39.7 | Paper |
Pedestron | CVPR2021 | 18.9 | 24.0 | 56.3 | - | - | Paper |
OPL | CVPR2023 | 23.4 | 28.8 | 62.7 | 28.0 | 38.7 | Paper |
LSFM | CVPR2023 | 18.7 | 24.9 | 56.2 | - | - | Paper |
MTOM | WACV2023 | 17.4 | 24.7 | 52.68 | - | - | Paper |
Cross-scene evaluation
method | R/R+HO (TJU-Ped-campus -> traffic) | R/R+HO (TJU-Ped-traffic -> campus) |
---|---|---|
FPN | 30.62 / 33.89 | 42.08 / 50.55 |
If this project help your research, please consider to cite our works.
@article{Pang_DHD_TIP_2020,
author = {Yanwei Pang and Jiale Cao and Yazhao Li and Jin Xie and Hanqing Sun and Jinfeng Gong},
title = {TJU-DHD: A Diverse High-Resolution Dataset for Object Detection},
journal = {IEEE Transactions on Image Processing},
year = 2021
}
@article{Cao_PDR_TPAMI_2020,
author = {Jiale Cao and Yanwei Pang and Jin Xie and Fahad Shahbaz Khan and Ling Shao},
title = {From Handcrafted to Deep Features for Pedestrian Detection: A Survey},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = 2022
}
Ablation studies can be conducted on the validation set.
If you would like to evaluate your model on the test set, you can send us (connor#tju.edu.cn, replace #
with @
) your detection results in the json
format.
If you have any questions or want to add your results, please feel free to contact us.