Open mansimov opened 7 years ago
Hey Elman, thank you very much. I trained the model back in July, and the environment got some updates along the way. To parallelize the training sometimes I manually stopped the training and update the weights with some of the best performing, then restart it, because at first I had problems with pickle and wanted to hand-manage the training. Then, I should update the code deleting te w_first parts: it was used to develop a different controller that managed to do two steps and stop the model in an equilibrium, so that it could be iterated, but if you plan to train it from scratch it is not useful. Anyway, sadly the train in quite stochastic: you can get little improvements for a long time, and then get a big update all of a sudden. Feel free to polish the code to meet your needs. For any other doubt, feel free to ask, you can contact me here too: normandipalo@me.com.
EDIT: I tried the training program and it starts with around 2 as a reward, then after just a couple of steps it goes to 3 as a total reward.
Hey Norman.
Thanks for the awesome blog post ! The idea of representing control with Fourier series is neat.
However when running the code I couldn't get beyond reward of ~1. I followed your suggestion and ran three models in parallel to avoid local minima, but still got stuck at the same reward.
I am starting to wonder whether there is some problem with the environment itself. Can you share which version of python and which commit of the environment you were using ? Otherwise, maybe there is some small important detail in code missing.
Thanks!