justinkay / aldi

Official implementation of "Align and Distill: Unifying and Improving Domain Adaptive Object Detection"
https://aldi-daod.github.io/
53 stars 7 forks source link

Reducing the accuracy of the second phase compared to the first phase #39

Open helia-mohamadi opened 2 weeks ago

helia-mohamadi commented 2 weeks ago

Hello again @justinkay, I have trained the model several times, but the accuracy of the second phase test is less than the first phase. I trained the model with lr = 0.01 and my accuracy was 62.77 for the first phase and 43.63 for the second phase. I also trained the model with lr = 0.001 and my accuracy was 65.5 for the first phase and 59.7 for the second phase. Do you know why this happened and in the second phase, despite the fact that domain adaptation was done, the accuracy was reduced?

justinkay commented 2 weeks ago

Hi @helia-mohamadi is this on a custom dataset? It's not really possible for me to give a diagnosis in this case since there could be many reasons. Have you inspected the training curves? What do they look like?

helia-mohamadi commented 1 week ago

Hello @justinkay, Yes, the dataset is custom, and the first time I trained my dataset on your model, this problem did not exist, but after the codes were changed a little by you (update ema checkpoints), I had this problem. I also checked the training process, during the training of the second phase, the accuracy gradually decreased from the last accuracy of the first phase or remained in the same range. I used about 30000 images for source train, 7500 validation images (that kind of similar to source train and are synthetic) and 43000 images for target train. For the first phase, I used 3170 iterations and for the second phase 9000 iterations and also use lr = 0.01, sending Log train or config train can help you more to diagnose this problem?

justinkay commented 1 week ago

Sorry to hear that change caused you issues. Yes, the more information you can provide, the better chance I can help diagnose. What would be helpful: logs, configs, training curves (i.e. screenshots from tensorboard), the commands you ran from the command line. Ideally if you have these from both the past runs and the current runs, then we can compare and see what has changed.

helia-mohamadi commented 1 week ago

Thank you Can I have your email address to send you this information?

justinkay commented 1 week ago

Are you able to post them here? That way it can be informative for others experiencing similar problems.

helia-mohamadi commented 1 week ago

of course, Here you are: test.zip

justinkay commented 1 week ago

Are you able to post the validation AP50 curves as well? Thanks.

helia-mohamadi commented 6 days ago

Yes, This is the list of validation AP50 for om my test data (37000 images) after training every 500 iterations with lr=0.001:

map_values = [65.097, 65.072, 65.101, 64.925, 64.921, 64.906, 64.597, 64.520, 64.139, 64.006, 63.790, 63.230, 62.603, 62.308, 61.692, 61.356, 60.607, 59.800] iterations = [499, 999, 1499, 1999, 2499, 2999, 3499, 3999, 4499, 4999, 5499, 5999, 6499, 6999, 7499, 7999, 8499, 8999]

! [image] (https://github.com/user-attachments/assets/a26434ab-7421-4a88-a3db-93add44e8f2d)

Also, when I trained the model, the best model was in iteration 4999 with acc 80.738 and 81.977 in training (on validation data) and 64.006 on test data.

justinkay commented 6 days ago

And how about from the previous run, when you were satisfied with the results? It would be helpful to see a comparison.