Couldn't reproduce the result reported in the paper

YifeiAI commented 6 years ago

Hi, thanks for the code! I am trying to reproduce your result using your code, but I only get a mIoU of 40 after 45000 iterations. The result get even worse after 65000 iterations. I am wondering if it is very sensitive to how many iterations to train? Also is there other things to mind using your code?

wasidennis commented 6 years ago

Since we do not fix the random seed for the data loader, the result may differ each time during training. We have trained our model for several times and it can reach the IoU around 41.x - 42.x after 100k~130k iterations.

For the domain adaptation problem using adversarial alignment, it is common that the training may not be stable, e.g., IoU becomes lower when training longer. I would suggest saving the model for every 10k iterations and you should find a better one.

YifeiAI commented 6 years ago

@wasidennis Thanks for the quick reply!

Another question is that, I find there is a cliff drop of the performance after 65k iterations. At 65k iteration the mIoU is around 40, but at 70k the mIoU is dropped to 32. Do you have any idea why it might happen? It happens twice, and both time the drop happens after 65k iterations.

wasidennis commented 6 years ago

It sounds a bit weird to me. In all our experiments, the IoU might drop a bit (1~3%) and will go back when training longer. In your case, does the IoU go back to >40 somehow? Btw, have you adjusted any hyperparameters in your experiment?

YifeiAI commented 6 years ago

It manage to go back to around 38 and then fluctuate. I didn't change any parameters (all the parameters are the default ones). I use a pytorch version of 0.2. Now I trained the network to 100k iterations and the result is 36.54.

YifeiAI commented 6 years ago

@wasidennis is it possible to post one of your training log or test curves if you still have them? I am also quite confused about the unstable performance. Also I tested the model (GTA2Cityscapes_multi-ed35151c) you posted, and I got the same result as you reported.

John1231983 commented 6 years ago

I think problem is training with Gan. You have to use early stopping. Otherwise, model is attached and accuracy down. Am I correct? @wasidennis

wasidennis commented 6 years ago

@John1231983 thanks for your explanation.

Yes, the GAN training is unstable and we observed that the accuracy will go down and not go back after training too long. However, in our several experiments, we usually have the number 38.x - 42.x between 60k - 150k.

@YifeiAI thanks for letting us know your results. I do not have the training log now. Could you post your log and we might be able to check whether the result is reasonable? After training 100k, you can also observe if the accuracy will not go back >40.

woozch commented 6 years ago

@YifeiAI I also have the same problem you wrote. I just executed the code three times with the same configurations and got 38.73%, 38.38%, and 36.73% mIoU at iteration 120k. I checked the early stage performance(<100k), but the result is almost the same. The training performance is not the same as the reported one. Although GAN loss is quite unstable, I could not achieve the same result so far. Any updates?

wasidennis commented 6 years ago

@wgchang have you tested the models with different iterations? We usually test it from 80k to 130k for every 10k iterations (of course you can do it even in finer iterations). You should be able to find some good models around 40% - 42%. Please let us know if you have any updates.

woozch commented 6 years ago

@wasidennis Thanks for replying :) I mentioned iteration 120k because NUM_STEPS_STOP is set to 120k in your code. I also checked 30k, 50k, and 80k but the result is almost the same( around 38%). I will check it from 80k to 130k.

woozch commented 6 years ago

@wasidennis I trained the model again and tested at 30k, 50k, 80k, 100k, and 120k. The results (mIoU) are 35.76, 34.51, 36.04, 37.4, and 36.84. I couldn't reach your reported result so far.

wasidennis commented 6 years ago

@wgchang that's pretty weird. In our experiments, it is never lower than 37% after 30k. Could you provide us more details/modifications when you train the model, e.g., pre-trained weights, hyperparameters?

In another thread, you mentioned that you are using pytorch 0.4. Have you modified anything?

woozch commented 6 years ago

@wasidennis There was no modifications. I just download your code, datasets and execute it. Memory issue was solved, but I trained both pytorch 0.2 and pytorch 0.4, the result are almost the same. Also, your pretrained model gives around 40% mIoU, not 43%(reported). I also downgrade cuda9.2 to cuda9.0

wasidennis commented 6 years ago

If you cannot use our provided model to obtain 42.x%, it is likely the evaluation problem. There was an issue during evaluation as described here: https://github.com/wasidennis/AdaptSegNet/issues/11#issue-343738037

But we should have fixed it in the evaluation code for pytorch 0.4, so I am not sure what's wrong. Btw, we used cuda 8.0.

John1231983 commented 6 years ago

Note that, pretrained model is early stop. I guess author check accuracy in validation set and decide stop. It may be overfitting. To reproduce result, you may not need to run all epochs, you should check val accuracy and use early stopping

woozch commented 6 years ago

Thanks for replying. I just downgraded cuda9.0 to cuda8.0 with pytorch0.2 and got 42.35% from the pretrained model. The behavior seems very weird, though. I think I need to reinstall cuda9.0.

woozch commented 6 years ago

It seems that my cuda9.0 installation was corrupted. :( Reinstalled cuda9.0 gives the same result. :)

wasidennis commented 6 years ago

Good to know it :)

YifeiAI commented 6 years ago

@wgchang I could get the same result using their posted model weight. However training from scratch only gives me highest 40.11 mIoU and the result is very unstable.

zqwhu commented 6 years ago

The same problem as mine. Here is my training log from Titan X, pytorch0.4, python3.6, cuda9.0. for reference only. Refer to the comment, those results are probably 1-1.5 percent lower. It seems that more training iteration may lead to decreased accuracy...

===> iter: 5000,mIoU: 29.15 ===> iter: 10000,mIoU: 35.73 ===> iter: 15000,mIoU: 31.92 ===> iter: 20000,mIoU: 35.47 ===> iter: 25000,mIoU: 37.49 ===> iter: 30000,mIoU: 37.77 ===> iter: 35000,mIoU: 39.27 ===> iter: 40000,mIoU: 39.88 ===> iter: 45000,mIoU: 40.61 ===> iter: 50000,mIoU: 37.5 ===> iter: 55000,mIoU: 40.03 ===> iter: 60000,mIoU: 39.18 ===> iter: 65000,mIoU: 39.47 ===> iter: 70000,mIoU: 37.91 ===> iter: 75000,mIoU: 37.01 ===> iter: 80000,mIoU: 37.17 ===> iter: 85000,mIoU: 37.94 ===> iter: 90000,mIoU: 39.09 ===> iter: 95000,mIoU: 38.33

===> iter: 100000,mIoU: 37.85 ===> iter: 105000,mIoU: 40.62 ===> iter: 110000,mIoU: 35.79 ===> iter: 115000,mIoU: 35.29 ===> iter: 120000,mIoU: 35.35 ===> iter: 125000,mIoU: 36.17 ===> iter: 130000,mIoU: 35.63 ===> iter: 135000,mIoU: 36.3 ===> iter: 140000,mIoU: 35.12 ===> iter: 145000,mIoU: 34.87 ===> iter: 150000,mIoU: 34.86 ===> iter: 155000,mIoU: 35.65 ===> iter: 160000,mIoU: 34.34 ===> iter: 165000,mIoU: 33.92 ===> iter: 170000,mIoU: 34.56 ===> iter: 175000,mIoU: 33.12 ===> iter: 180000,mIoU: 33.38 ===> iter: 185000,mIoU: 33.18 ===> iter: 190000,mIoU: 33.21

woozch commented 6 years ago

I downloaded the repo and execute the code get the reported result(around 40% acc), however, it seems that the bug here was not fixed in the current repo. (ref: deeplab-pytorch If I fix this bug, the result get worse (around 38% acc).

wasidennis commented 6 years ago

Yes, please do not fix this bug for now, as the current hyper-parameters are tuned based on the version with this bug. If fixing this bug, we will need to tune a new set of hyper-parameters to achieve a reasonable result.

EthanZhangYi commented 6 years ago

I try to run the code with pytorch0.4.0, and the result is so unstable.

iteration 5000      0.3098
iteration 10000     0.3488
iteration 15000     0.3575
iteration 20000     0.3531
iteration 25000     0.3715
iteration 30000     0.3851
iteration 35000     0.4022
iteration 40000     0.3675
iteration 45000     0.3738
iteration 50000     0.3930
iteration 55000     0.3799
iteration 60000     0.3602
iteration 65000     0.3745
iteration 70000     0.3816
iteration 75000     0.3752
iteration 80000     0.3582
iteration 85000     0.4025
iteration 90000     0.3873
iteration 95000     0.3761
iteration 100000    0.3729
iteration 105000    0.4246
iteration 110000    0.3850
iteration 115000    0.4084
iteration 120000    0.3983

Only at iteration 105000, the result in paper is reproduced.

wasidennis / AdaptSegNet

Couldn't reproduce the result reported in the paper #19