Closed yaokunZ closed 1 year ago
Hi, thank you for your interest in our paper.
Can you tell me a more detailed explanation of the phenomenon you found such as the name of added parameters or code snippet you used for checking the parameters?
Thanks,
bash qd_detr/scripts/inference.sh results/video_checkpoint/model_best.ckpt 'val' As shown below, we add a encoderlayer with default initialization parameters in inference.py
Hi,
As I understand, you add the transformer encoder layers using the code snippet you provide, but the results are unchanged.
If so, you should check the layers you initialize to participate in the forward process.
If you want to attach additional transformer encoder layers, I recommend changing the Transformer class in "qd-detr/trasnformer.py" or arguments in build_model function in "qd-detr/model.py".
I hope this resolves your issue.
Thanks,
I made sure the layers you initialize to participate in the forward process. When the number of enc layer is about 2 , the results are remains basically unchanged. When the number of enc layer is increasing, the results will be significantly reduced.
Okay, up to what I understand,
You have tried to add untrained layers during inference to check whether the performance is degraded or not right?
And you have found that when the number of randomly initialized layers is 2, there is no performance decrease but results are significantly reduced when the number of random layers is increased.
Is that right?
Yes! Your understanding is absolutely correct.
Sorry for my confusion,
I checked your scenario, and I also found that only 1 or 2 additional transformer layers do not hurt the performance or there is a only subtle degradation in the performance.
One possible hypothesis is that the transformer layer does not transform the feature space much as they have a residual connection between the input and output of the layer. This phenomenon is not what we intend, and personally, this is interesting and I'm curious whether similar phenomena also occur in other transformer-based frameworks.
Thanks,
hi, authors! I found a very unreasonable phenomenon. In the test phase, several layers of encoder layers with default initialization parameters are added after the encoder of the model, and the salience accuracy(like HL-min-VeryGood-mAP, Hit1) remains basically unchanged. The model used is provided by the author on github.