wjun0830 / QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
https://arxiv.org/abs/2303.13874
Other
199 stars 15 forks source link

a very unreasonable phenomenon #19

Closed yaokunZ closed 1 year ago

yaokunZ commented 1 year ago

hi, authors! I found a very unreasonable phenomenon. In the test phase, several layers of encoder layers with default initialization parameters are added after the encoder of the model, and the salience accuracy(like HL-min-VeryGood-mAP, Hit1) remains basically unchanged. The model used is provided by the author on github.

hse1032 commented 1 year ago

Hi, thank you for your interest in our paper.

Can you tell me a more detailed explanation of the phenomenon you found such as the name of added parameters or code snippet you used for checking the parameters?

Thanks,

yaokunZ commented 1 year ago

bash qd_detr/scripts/inference.sh results/video_checkpoint/model_best.ckpt 'val' As shown below, we add a encoderlayer with default initialization parameters in inference.py image

hse1032 commented 1 year ago

Hi,

As I understand, you add the transformer encoder layers using the code snippet you provide, but the results are unchanged.

If so, you should check the layers you initialize to participate in the forward process.

If you want to attach additional transformer encoder layers, I recommend changing the Transformer class in "qd-detr/trasnformer.py" or arguments in build_model function in "qd-detr/model.py".

I hope this resolves your issue.

Thanks,

yaokunZ commented 1 year ago

I made sure the layers you initialize to participate in the forward process. When the number of enc layer is about 2 , the results are remains basically unchanged. When the number of enc layer is increasing, the results will be significantly reduced. image

wjun0830 commented 1 year ago

Okay, up to what I understand,

  1. You have tried to add untrained layers during inference to check whether the performance is degraded or not right?

  2. And you have found that when the number of randomly initialized layers is 2, there is no performance decrease but results are significantly reduced when the number of random layers is increased.

Is that right?

yaokunZ commented 1 year ago

Yes! Your understanding is absolutely correct.

hse1032 commented 1 year ago

Sorry for my confusion,

I checked your scenario, and I also found that only 1 or 2 additional transformer layers do not hurt the performance or there is a only subtle degradation in the performance.

One possible hypothesis is that the transformer layer does not transform the feature space much as they have a residual connection between the input and output of the layer. This phenomenon is not what we intend, and personally, this is interesting and I'm curious whether similar phenomena also occur in other transformer-based frameworks.

Thanks,