Open skmp opened 4 years ago
We can base our work on this project: https://sudonull.com/post/21544-OpenAI-Universe-Open-platform-for-training-strong-AI Very efficient ML approaches for such projects are NEAT https://en.wikipedia.org/wiki/HyperNEAT , having solved Mario games https://github.com/vivek3141/super-mario-neat https://eng.uber.com/go-explore/ and go-Explore: https://eng.uber.com/go-explore/ having solved extremely challenging montezuma's revenge
Also this https://github.com/mwydmuch/ViZDoom
http://vizdoom.cs.put.edu.pl
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information
M Wydmuch, M Kempka & W Jaśkowski, ViZDoom Competitions: Playing Doom from Pixels, IEEE Transactions on Games, in print, arXiv:1809.03470
Useful framework: https://sudonull.com/post/21544-OpenAI-Universe-Open-platform-for-training-strong-AI
Montezuma's revenge solved by: Montezuma's Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too) | Uber Engineering Blog https://eng.uber.com/go-explore/ uber-research/go-explore: Code for Go-Explore: a New Approach for Hard-Exploration Problems https://github.com/uber-research/go-explore [1901.10995] Go-Explore: a New Approach for Hard-Exploration Problems https://arxiv.org/abs/1901.10995
or alternatively: Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains | Proceedings of the AAAI Conference on Artificial Intelligence https://www.aaai.org/ojs/index.php/AAAI/article/view/3876 mchldann / aaai2019 — Bitbucket https://bitbucket.org/mchldann/aaai2019/src/master/
So what are our immediate next steps? python bindings? (poke @gigaherz)
Python bindings are needed in order to interop more easily with the libraries above. Similar libs in .NET are not available.
One more related fresh project is: Deep Neuroevolution of Self-Interpretable Agents https://attentionagent.github.io/
What would you need in the python bindings? Can you spec out an API for us?
Yes, I will isolate the related API contracts (method signatures) from: openai/gym: A toolkit for developing and comparing reinforcement learning algorithms. https://github.com/openai/gym
openai/retro: Retro Games in Gym https://github.com/openai/retro
and reply ASAP I may also consider some other opensource projects too: https://github.com/uber-research/go-explore https://github.com/vivek3141/super-mario-neat
excellent: thu-ml/tianshou: An elegant, flexible, and superfast PyTorch deep Reinforcement Learning platform. https://github.com/thu-ml/tianshou
simple RL game project: uvipen/Tetris-deep-Q-learning-pytorch: Deep Q-learning for playing tetris game https://github.com/uvipen/Tetris-deep-Q-learning-pytorch
Gym retro https://github.com/openai/retro
is basically using
https://www.libretro.com/index.php/api/
"When you choose to use the libretro API, your program gets turned into a single library file (called a ‘libretro core’). A frontend that supports the libretro API can then load that library file and run the app. The frontend’s responsibility is to provide all the implementation-specific details, such as video/audio/input drivers."
This is how a console is modeled, an example: https://github.com/openai/retro/blob/master/cores/genesis.json
This is the python API we must abide to: https://retro.readthedocs.io/en/latest/python.html
Retro also uses LUAjit+DynASM in its c++ code components An indicative use: Finding Variables "Score occasionally is stored in individual locations — e.g. if the score displayed is 123400, 1, 2, 3, 4, 0, 0 all will update separately. If the score is broken into multiple variables, make sure you have penalties set for the individual digits (such as BOB-Snes). A number of games will update the score value across multiple frames, in this case you will need a lua script to correct the reward, such as 1942-Nes." ( https://retro.readthedocs.io/en/latest/integration.html )
Deepmind's libs are not as convenient as OpenAI's, so for the time being, we will most probably stick to Openai. However, some RL algos from deepmind (as well as the other RL/hyperneat libs mentioned above) could be reused in combination with openai libs. DeepMind’s AI can now play all 57 Atari games—but it’s still not versatile enough - MIT Technology Review https://www.technologyreview.com/f/615429/deepminds-ai-57-atari-games-but-its-still-not-versatile-enough/
deepmind/bsuite: bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent https://github.com/deepmind/bsuite
Agent57: Outperforming the human Atari benchmark | DeepMind https://deepmind.com/blog/article/Agent57-Outperforming-the-human-Atari-benchmark Comment by Gwern.net: ““Agent57: Outperforming the Atari Human Benchmark”, Badia et al 2020 (blog; Agent57 reaches the median human level across ALE—including Pitfall!/Montezuma’s Revenge. It is impressive but still sample-inefficient & uncomfortably baroque in combining what seems like every DM model-free DRL technique in one place: DDQN, Impala, R2D2, Memory Networks, Transformers, Neural Episodic Control, RND, NGU, PBT, MABs… Is model-free DRL a dead end if this is what it takes? I would have preferred to see ALE solved by better exploration in the enormously simpler MuZero.)" ALE: https://github.com/mgbellemare/Arcade-Learning-Environment
Interesting: [1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model https://arxiv.org/abs/1911.08265#deepmind [2002.06038] Never Give Up: Learning Directed Exploration Strategies https://arxiv.org/abs/2002.06038#deepmind
openai/spinningup: An educational resource to help anyone learn deep reinforcement learning. https://github.com/openai/spinningup
In the last meeting, it was proposed to test nemco museum and Doom.
Links on RL + Doom or Sonic Hedgehog: An introduction to Deep Q-Learning: let’s play Doom https://www.freecodecamp.org/news/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8/
Diving deeper into Reinforcement Learning with Q-Learning https://www.freecodecamp.org/news/diving-deeper-into-reinforcement-learning-with-q-learning-c18d0db58efe/
An introduction to Deep Q-Learning: let’s play Doom https://www.freecodecamp.org/news/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8/
Improvements in Deep Q Learning: Dueling Double DQN, Prioritized Experience Replay, and fixed… https://www.freecodecamp.org/news/improvements-in-deep-q-learning-dueling-double-dqn-prioritized-experience-replay-and-fixed-58b130cc5682/
An introduction to Policy Gradients with Cartpole and Doom https://www.freecodecamp.org/news/an-introduction-to-policy-gradients-with-cartpole-and-doom-495b5ef2207f/
An intro to Advantage Actor Critic methods: let’s play Sonic the Hedgehog! https://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/
Proximal Policy Optimization (PPO) with Sonic the Hedgehog 2 and 3 https://towardsdatascience.com/proximal-policy-optimization-ppo-with-sonic-the-hedgehog-2-and-3-c9c21dbed5e
Curiosity-Driven Learning made easy Part I - Towards Data Science https://towardsdatascience.com/curiosity-driven-learning-made-easy-part-i-d3e5a2263359
Deep Reinforcement Learning Course https://simoninithomas.github.io/Deep_reinforcement_learning_Course/
Playing DOOM with Deep Reinforcement Learning - James Liang - Medium https://medium.com/@james.liangyy/playing-doom-with-deep-reinforcement-learning-e55ce84e2930
mwydmuch/ViZDoom: Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. https://github.com/mwydmuch/ViZDoom
Baekalfen/PyBoy: Game Boy emulator written in Python https://github.com/Baekalfen/PyBoy
Scripts, AI and Bots · Baekalfen/PyBoy Wiki https://github.com/Baekalfen/PyBoy/wiki/Scripts,-AI-and-Bots
please fill this in