wangkunyu241 / UAV-Frequency

This is the code of the paper "Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement",which is submitted to IJCV. It is an extension of our CVPR 2023 paper "Generalized UAV Object Detection via Frequency Domain Disentanglement".
13 stars 1 forks source link

IJCV工作疑问 #1

Closed Clearlangw closed 6 months ago

Clearlangw commented 6 months ago

您好,我是来自北航计算机的一名本科大四生,去年读了您CVPR2023的工作之后觉得非常有趣,想尝试复现的时候发现自己的编程能力有些欠缺了,很开心最近能在github找到您的开源,但我目前在google scholar和IJCV在springer下的网站(https://link.springer.com/journal/11263/articles)都没找到您的IJCV文章,请问您IJCV的工作(这份)和CVPR的工作有哪些显著不同吗?很感谢您的开源,祝您工作顺利,生活愉快

Clearlangw commented 6 months ago

目前能看出来的部分是yolov5被替换成了faster-rcnn以及增加了图片级的对比损失

wangkunyu241 commented 6 months ago

Thanks for your interest for our work. Our IJCV work is still in submission, maybe several months for revision. In the journal version, our main contributions are as follows:

(1)Through exploratory experiments, we have gained a crucial insight into the field of UAV Object Detection (UAV-OD): the contributions of different frequencies in generalization exhibit more pronounced disparities within UAV-OD, which deals with smaller-sized objects, compared to general object detection, which involves larger objects.

(2)Based on these findings, we are making the earliest effort to improve UAV-OD generalization through frequency domain disentanglement. This method serves as a more direct and efficient approach, providing a novel perspective to the field.

(3)We propose a novel frequency domain disentanglement framework that uses two learnable filters to extract domain-invariant and domain-specific spectrums. We design two novel contrastive losses at the image and instance level to guide the disentangling process.

The main differences between the journal version and the conference version are as follows:

(1)Firstly, we clarify the motivation behind our approach and provide exploratory experiments to justify the suitability of frequency domain disentanglement for enhancing UAV-OD generalization.

(2)Secondly, we introduce a more efficient frequency domain disentanglement structure and new image-level contrastive loss at the methodological level. Through ablation experiments, we demonstrate the effectiveness of these enhancements, further improving the generalization of UAV-OD networks.

(3)Lastly, we conduct a comprehensive suite of experiments, providing extensive ablation studies, visualizations, and a thorough analysis based on empirical findings.

Clearlangw commented 6 months ago

很感谢您的解答,阅读了您的extra.py我现在了解到您现在可能实际上只使用了单边的filter,这应该是种高效的办法。另外我注意到您这边似乎利用频率域的对称(w/2+1)这一步来减少参数。之前看您cvpr2023的论文preliminary experiments应该是使用了固定大小的参数,参考这样的想法,您后面有尝试过以其中表现相对较好的0-1矩阵作为初始参数来训练filter吗(个人觉得或许是个优化的方法,因为现在看您的代码似乎是0-1随机初始化的),另外您CVPR2023的工作方便开源吗?

wangkunyu241 commented 6 months ago

From the perspective of the framework structure, the IJCV version hasn't actually undergone many changes. Through experiments, we found that a single-sided learnable filter is a more efficient and effective structure. If further modifications are desired, I would recommend starting with this structure. Regarding your mention of the initialization selection for learnable filters, we haven't tried the method you suggested, but I believe it's a direction worth exploring.