msmbuilder / vde

Variational Autoencoder for Dimensionality Reduction of Time-Series
MIT License
185 stars 43 forks source link

Default learning rate and other hyper params. #6

Open msultan opened 6 years ago

msultan commented 6 years ago

Based upon some testing, I am starting to think that the default learning rate of 1e-4 is probably too low for our applications and might be better to bump it up to 5e-3 or even 1e-2. This is mostly based on empirical observations that the higher learning rates tend to get "similar" looking models even with differing architectures, batch sizes, and number of epochs. It also helps that we have the Adam optimizer which can attenuate the rate as training goes forward.

brookehus commented 6 years ago

Could also look at adaptive learning rate as a function of epoch