openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.3k stars 8.59k forks source link

[working deep RL demo][need help]Lasagne+Agentnet baselines #16

Closed justheuristic closed 8 years ago

justheuristic commented 8 years ago

Greetings! We happen to have just pushed into the open source one of the Lasagne-based library for reinforcement learning algorithm design.

On the bright side,

On the gloomy one, it has been made public ~4 days ago and doesn't have a community yet. Prior to that, it's only been used by several Yandex researchers for tinkering.

I would very much like to provide a set of baseline training/testing stands for several problems (and i will do so shortly), for people to be able to experiment with NN architecture, but i'm a bit doubtful about

The most basic Reinforcement Learning pipeline looks like this

The questions are, again, if there is anyone interested in having such baselines for gym problems, and if so, what are possible api improvements you would recommend?

wojzaremba commented 8 years ago

Having good, simple baselines is super important. This could tell which simple algorithms work. Feel free to submit your algs, and we will reproduce it, and review it.

justheuristic commented 8 years ago

It actually turned out that our interfaces are even closer - around 30 lines of code from my example to start plating atari with simple convolutional q-network.

Could you please try to run it on your side to see if they are indeed reproducible? (e.g. change game environment name)

Gym is indeed a super-unified interface, as switching through several atari games boiled down to simply switching their names in the notebook header.

@wojzaremba Now the main question is: if these demos are at least somewhat readable, what would be the best course of actions to make them more interpretable?

justheuristic commented 8 years ago

[updated comment above: the demo became more readable and has better scores]

Currently trying out several random games. So far works with 0 code changes.

wojzaremba commented 8 years ago

Awesome. Looking forward for you submissions.

justheuristic commented 8 years ago

This one is clearly early-stopped (learning curve still goes up almost linearly), but it works ~somehow~ nevertheless. https://gym.openai.com/evaluations/eval_yv5uO0fRiS7eNaZGFKacw#reproducibility -- and yes, it was still acting epsilon-greedily, cuz i forgot to zero-out epsilon :) Currently running it till convergence.

justheuristic commented 8 years ago

The question stands: how to make these more understandable?

justheuristic commented 8 years ago

Thanks. Added more detailed NN setup.

Again, please suggest any ways this thing can get easier to understand.

[ A step-by-step demo for Atari SpaceInvaders ]

gdb commented 8 years ago

(I might recommend asking in chat — I think one helpful approach might be to ask a beginner to work through it.)

justheuristic commented 8 years ago

Well, i was stupid not to. Thanks!

Upd - trained NN to hurt humans using Kung-Fu, GRU and advantage actor-critic https://github.com/yandexdataschool/AgentNet/blob/master/examples/Deep%20Kung-Fu%20with%20GRUs%20and%20A2c%20algorithm%20%28OpenAI%20Gym%29.ipynb

submission https://gym.openai.com/envs/KungFuMaster-v0?filter=all#feed

gdb commented 8 years ago

(Going to close this; please let me know if there's more to discuss!)