oscillations and non-convergence during adaptation

RunlinZou commented 6 months ago

Hi，@saltoricristiano Thank you for such awesome work! I am very interested in your work and am trying to reproduce the UDA results in the paper, but during the adaptation phase I found through wandb that there is serious oscillation in the training of the teacher model and the student model, and there is no obvious convergence. During the evaluation phase I found that the results of the model did not change significantly and were concentrated around 26. Have you ever encountered this situation? I used 4 NVIDIA 4090s and set the batchsize and epoc to 2 and 20 respectively, but I don't think this would cause such severe oscillation and non-convergence. Best wishes!

saltoricristiano commented 6 months ago

Hi @RunlinZou,

Thanks for the interest in our work! Can you give me a bit more context? For example, in which adaptation direction are you experiencing this issue? Setting the batch size to a lower value may also require lowering the learning rate to achieve more stable behavior.

Thx!

RunlinZou commented 6 months ago

Hi，@saltoricristiano Thanks for your reply. I used the command "python adapt_cosmix.py --config_file configs/adaptation/synlidar2semantickitti_cosmix.yaml" mentioned in readme to start the adapt phase, and then used "python eval.py --config_file configs/adaptation/synlidar2semantickitti_cosmix.yaml --resume_path PATH-TO-EXPERIMENT --is_student --eval_traget" for eval. Considering the impact of batch size, I tried three different learning rates of 0.001, 0.0005 and 0.0003 and tried to increase the number of epochs. But after executing the adapt command, I found that the curves of target_validation and student/XX_target_iou were oscillating violently by observing the results shown by wandb. After adjusting smoothing, I found that the student model had almost no change and the the curve is almost a straight line. In addition, I found that when I first called the get_dataset method in the test method of eval.py, I used config.dataset, but I did not find dataset in the yaml file, so I replaced it with config. source_dataset. Is this a problem that existed in the code? Best wishes! fbd52a26899945b1c6181371b9a789c

saltoricristiano / cosmix-uda

oscillations and non-convergence during adaptation #18