Advice for hyper-parameter tuning

postBG / DTA.pytorch

Official implementation of Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation presented at ICCV 2019.

163 stars 15 forks source link

Advice for hyper-parameter tuning #9

Closed gweiying closed 5 years ago

gweiying commented 5 years ago

Hi,

Thanks for this repository, it is very well-structured and really easy to follow. I have a question about general advice for hyper-parameter tuning:

I've modified the code to run a 1D Resnet34 on time series data; right now the model is performing very well on the source distribution, reaching a max accuracy of 95.6, but achieves only max accuracy of 4.3 on the target distribution. I've noticed that as it fits to the source distribution, the accuracy on the target distribution tends to decrease.

Would you have any advice for tuning the hyperparameters? It seems like it is overfitting on the source distribution and not learning any shared features.

Thanks again!

gweiying commented 5 years ago

My bad, I realized there was an error in the way I set up up my target dataset. Resolving, thanks!

postBG commented 5 years ago

We’ve checked this issue, now. Good luck to you :)

gweiying commented 5 years ago

Thanks for the prompt reply! Would you happen to have general guidelines on hyper parameter tuning? I seem to overfit to the source data still.

At Epoch 1, my accuracy on my source data is 79%, and accuracy on target is 58%. The performance on the target data just decreases with training. At Epoch 4, the accuracy on the source data is 96%, and the accuracy on target drops to 25%.

For reference, just classification on the target data can reach about 83% accuracy (I'm using this as an upper bound).

Any advice on how not to overfit and improve target performance will be really appreciated!

postBG commented 5 years ago

Actually, we do not know the exact problem setting with which you are dealing. Thus, our comments may not be helpful to you. If you tackle a problem that deals with time series, I recommend you to start with the parameters described in here. Because the paper is about RNN, maybe you should change some code of your project.

When we conducted experiments, we followed the following process. First, we started with the hyperparameters reported in Adv drop. And then, we focused on selecting the proper ramp-up length. In our experiments, the deltas were not critical, but the ramp-up length was helpful to stable learning.

If the overfitting is the main problem, it could be caused by small regularization (=small consistency loss). Therefore, larger delta or shorter ramp-up length would be helpful, but I'm not sure about it. Please note that the proper range of hyperparameters depends on the details of the problem that you tackle. I just hope my comment can help you.

gweiying commented 5 years ago

Thanks so much, the comments are helpful. I'll try out some of the suggestions above and update this thread if I see significant improvements.

I think there's just a small bug, you're saving the feature extractor's state dictionary as 'feature_extractor_state_dict' but loading them from 'encoder_state_dict' https://github.com/postBG/DTA.pytorch/blob/ca33b3f118c3fcb1a2b9f6bfaf57681180e206b2/trainers/dta_trainer.py#L285 https://github.com/postBG/DTA.pytorch/blob/ca33b3f118c3fcb1a2b9f6bfaf57681180e206b2/models/__init__.py#L26

postBG commented 5 years ago

Thanks for finding the error. I will check it as soon as possible :)