Closed justheuristic closed 8 years ago
Having good, simple baselines is super important. This could tell which simple algorithms work. Feel free to submit your algs, and we will reproduce it, and review it.
It actually turned out that our interfaces are even closer - around 30 lines of code from my example to start plating atari with simple convolutional q-network.
Could you please try to run it on your side to see if they are indeed reproducible? (e.g. change game environment name)
Gym is indeed a super-unified interface, as switching through several atari games boiled down to simply switching their names in the notebook header.
@wojzaremba Now the main question is: if these demos are at least somewhat readable, what would be the best course of actions to make them more interpretable?
[updated comment above: the demo became more readable and has better scores]
Currently trying out several random games. So far works with 0 code changes.
Awesome. Looking forward for you submissions.
This one is clearly early-stopped (learning curve still goes up almost linearly), but it works ~somehow~ nevertheless. https://gym.openai.com/evaluations/eval_yv5uO0fRiS7eNaZGFKacw#reproducibility -- and yes, it was still acting epsilon-greedily, cuz i forgot to zero-out epsilon :) Currently running it till convergence.
The question stands: how to make these more understandable?
Thanks. Added more detailed NN setup.
Again, please suggest any ways this thing can get easier to understand.
(I might recommend asking in chat — I think one helpful approach might be to ask a beginner to work through it.)
Well, i was stupid not to. Thanks!
Upd - trained NN to hurt humans using Kung-Fu, GRU and advantage actor-critic https://github.com/yandexdataschool/AgentNet/blob/master/examples/Deep%20Kung-Fu%20with%20GRUs%20and%20A2c%20algorithm%20%28OpenAI%20Gym%29.ipynb
submission https://gym.openai.com/envs/KungFuMaster-v0?filter=all#feed
(Going to close this; please let me know if there's more to discuss!)
Greetings! We happen to have just pushed into the open source one of the Lasagne-based library for reinforcement learning algorithm design.
On the bright side,
On the gloomy one, it has been made public ~4 days ago and doesn't have a community yet. Prior to that, it's only been used by several Yandex researchers for tinkering.
I would very much like to provide a set of baseline training/testing stands for several problems (and i will do so shortly), for people to be able to experiment with NN architecture, but i'm a bit doubtful about
The most basic Reinforcement Learning pipeline looks like this
The questions are, again, if there is anyone interested in having such baselines for gym problems, and if so, what are possible api improvements you would recommend?