wjun0830 / QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
https://arxiv.org/abs/2303.13874
Other
199 stars 15 forks source link

With the same seed, the set of eval_epoch can really influence the performance of model! Why? #40

Closed snailma0229 closed 5 months ago

snailma0229 commented 6 months ago

thanks for ur excellent work, and i found an interesting thing that with the same seed, the set of eval_epoch can really influence the performance of model! (I have test the set of eval_epoch 1 and 5) and i don't know why!

snailma0229 commented 6 months ago

thanks for ur excellent work, and i found an interesting thing that with the same seed, the set of eval_epoch can really influence the performance of model! (I have test the set of eval_epoch 1 and 5) and i don't know why!

also in the same machine

snailma0229 commented 6 months ago

thanks for ur excellent work, and i found an interesting thing that with the same seed, the set of eval_epoch can really influence the performance of model! (I have test the set of eval_epoch 1 and 5) and i don't know why!

I think that even if the eval epoch settings are different (such as 1 or 5), with the same seed, their results in the 5th epoch (or the multiples of 5) should be consistent, but it isn't.

wjun0830 commented 6 months ago

Hello. Thanks for your interest in our work.

First of all, we totally agree with your thought that it will be best if the results are consistent. For the reasons of inconsistency, we want to notify you that model learning is not always consistent and the best checkpoints may be yielded at a specific epoch that is not multiples of 5. Yet, if everyone keeps the same convention for evaluation following the starting work (Moment-DETR), we think the fairness may be kept.

Thus, we strongly think that keeping the convention in evaluation is the best way to keep the fairness for the evaluation. (using the same evaluation intervals)

Furthermore, if you train the model to the end, we expect that there will be much less inconsistency.

snailma0229 commented 6 months ago

Hello. Thanks for your interest in our work.

First of all, we totally agree with your thought that it will be best if the results are consistent. For the reasons of inconsistency, we want to notify you that model learning is not always consistent and the best checkpoints may be yielded at a specific epoch that is not multiples of 5. Yet, if everyone keeps the same convention for evaluation following the starting work (Moment-DETR), we think the fairness may be kept.

Thus, we strongly think that keeping the convention in evaluation is the best way to keep the fairness for the evaluation. (using the same evaluation intervals)

Furthermore, if you train the model to the end, we expect that there will be much less inconsistency.

thanks for your reply I agree with what you said about fairness, it's just that I'm curious why the performance difference occurs. In fact, if the model is trained to the end(200 epoch), the inconsistency will be greatly increased, and the mAP value difference can be up to 2. What's more, I found that the results of this code is very volatile. I have run it in three machines, and even if the torch is same as '1.9.0+cu111', the mAP value difference can be up to 1.5. Have you ever had this problem and solve it?

wjun0830 commented 6 months ago

We didnt noticed that as we havent changed the evaluation protocol. But as i said, I guess that the problem is inconsistency in training deep learning frameworks. I guess it would have been more fair if the eval interval is set to 1.

We also have recognized the issue with the feedbacks after the code release since we have used the same machine. Although we have strived to find the underneath reason, we are very sorry to notify that we werent able to find the root cause.