zhangbin-ai / APL

APL for AVQA task
2 stars 1 forks source link

Extract visual features and bounding box features #2

Open nanacoco419 opened 2 months ago

nanacoco419 commented 2 months ago

Hello, I thoroughly enjoyed reading your paper, "Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering."

I am writing to ask about the code provided for the paper. I am trying to replicate the performance experiments but noticed that the visual features and bounding box features extracted with DETR are not provided. I have attempted to extract these features using a pre-trained DETR model and integrated them into your code. However, I am observing a performance difference of approximately 4%.

Could you possibly share the visual features and bounding box features you extracted for the experiments in your paper? Additionally, the paper does not specify the backbone used for DETR. Could you clarify whether it is based on ResNet50, ResNet101, or another backbone?

shinever22 commented 2 weeks ago

Hello, I have the same problem. Have you received any help?