wangkunyu241 / UAV-Frequency

This is the code of the paper "Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement",which is submitted to IJCV. It is an extension of our CVPR 2023 paper "Generalized UAV Object Detection via Frequency Domain Disentanglement".
15 stars 1 forks source link

CVPR paper dataset problem #2

Closed starkLJ closed 8 months ago

starkLJ commented 8 months ago

I am glad to see your work, and I also hope that the work in IJCV can be smoothly accepted. I have some questions about the paper 'Generalized UAV Object Detection via Frequency Domain Disentanglement' in CVPR.

In the CVPR paper, “UAVDT can be separated into three sections based on the weather annotations: 23,741 daylight images, 11,489 nighttime images, and 2,492 foggy images. We choose the nighttime portion to replicate the diverse illumination scenario, the foggy portion to simulate the adverse weather scenario, and 2,850 daylight images with different scene structures compared with the remaining daylight images.”

Could you please confirm if this dataset configuration differs from the one provided in the open-source dataset for this project? In the dataset I downloaded, the number of haze data exceeds 2,492 images. Additionally, the results provided in the project (IJCV) are much higher compared to CVPR. Could this be due to changes in the dataset? If possible, I would appreciate it if you could provide the complete dataset configuration for CVPR, as this would be very helpful to me.

wangkunyu241 commented 8 months ago

Yes, you are right. I need to update the dataset settings in our IJCV version. If you want, please follow our IJCV dataset settings:

We evaluate the proposed method using popular UAV-OD benchmarks: UAVDT and Visdrone2019-DET.

UAVDT consists of 41k frames with 840k bounding boxes, divided into three classifications; cars, trucks, and buses. Given that the distribution in UAVDT is disproportionately skewed with the last two classes constituting less than 5% of bounding boxes, we consolidate them into a unified class abiding by the authors' convention in \cite{du2018unmanned}. To set apart the source and target domains, we handpicked 20,891 daylight images, 11,489 images taken at night, and 5,179 foggy images from UAVDT based on the weather tags.

VisDrone2019-DET, on the other hand, contains 8,629 static images, inclusive of its training, validation, and testing sets. These snapshots are records from various drone platforms at different locations and heights. They are meticulously annotated with bounding boxes tagging objects falling into ten set classes like pedestrian, person, and vehicles like bicycles, cars, and tricycles. To facilitate cross-dataset generalization experiments, we adopt the category settings from UAVDT. We exclusively utilize labels corresponding to the car, van, bus, and truck classes in the Visdrone2019-DET dataset and treat them as a single class. Given the lack of weather labels in Visdrone2019-DET, we manually picked segments with dim lighting to compile a dataset comprising 5,709 daylight and 1,698 nighttime images.

Ultimately, we treat the daylight portion from both UAVDT and Visdrone2019-DET as the source domain, respectively. Meanwhile, we considered the nighttime and foggy portions from UAVDT, along with the nighttime portion from Visdrone2019-DET, as three distinct unseen target domains for conducting experiments on both intra-dataset and cross-dataset generalization validations.

starkLJ commented 8 months ago

Yes, you are right. I need to update the dataset settings in our IJCV version. If you want, please follow our IJCV dataset settings:

We evaluate the proposed method using popular UAV-OD benchmarks: UAVDT and Visdrone2019-DET.

UAVDT consists of 41k frames with 840k bounding boxes, divided into three classifications; cars, trucks, and buses. Given that the distribution in UAVDT is disproportionately skewed with the last two classes constituting less than 5% of bounding boxes, we consolidate them into a unified class abiding by the authors' convention in \cite{du2018unmanned}. To set apart the source and target domains, we handpicked 20,891 daylight images, 11,489 images taken at night, and 5,179 foggy images from UAVDT based on the weather tags.

VisDrone2019-DET, on the other hand, contains 8,629 static images, inclusive of its training, validation, and testing sets. These snapshots are records from various drone platforms at different locations and heights. They are meticulously annotated with bounding boxes tagging objects falling into ten set classes like pedestrian, person, and vehicles like bicycles, cars, and tricycles. To facilitate cross-dataset generalization experiments, we adopt the category settings from UAVDT. We exclusively utilize labels corresponding to the car, van, bus, and truck classes in the Visdrone2019-DET dataset and treat them as a single class. Given the lack of weather labels in Visdrone2019-DET, we manually picked segments with dim lighting to compile a dataset comprising 5,709 daylight and 1,698 nighttime images.

Ultimately, we treat the daylight portion from both UAVDT and Visdrone2019-DET as the source domain, respectively. Meanwhile, we considered the nighttime and foggy portions from UAVDT, along with the nighttime portion from Visdrone2019-DET, as three distinct unseen target domains for conducting experiments on both intra-dataset and cross-dataset generalization validations.

Thank you very much for your reply. The question has been resolved.