skmp / reicast-emulator

Reicast was a multiplatform Sega Dreamcast emulator
https://reicast.emudev.org
Other
1.1k stars 346 forks source link

Automated testing and AI integration research #1837

Open skmp opened 4 years ago

skmp commented 4 years ago

please fill this in

stjordanis commented 4 years ago

We can base our work on this project: https://sudonull.com/post/21544-OpenAI-Universe-Open-platform-for-training-strong-AI Very efficient ML approaches for such projects are NEAT https://en.wikipedia.org/wiki/HyperNEAT , having solved Mario games https://github.com/vivek3141/super-mario-neat https://eng.uber.com/go-explore/ and go-Explore: https://eng.uber.com/go-explore/ having solved extremely challenging montezuma's revenge

stjordanis commented 4 years ago

Also this https://github.com/mwydmuch/ViZDoom http://vizdoom.cs.put.edu.pl Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information
M Wydmuch, M Kempka & W Jaśkowski, ViZDoom Competitions: Playing Doom from Pixels, IEEE Transactions on Games, in print, arXiv:1809.03470

stjordanis commented 4 years ago

Useful framework: https://sudonull.com/post/21544-OpenAI-Universe-Open-platform-for-training-strong-AI

Montezuma's revenge solved by: Montezuma's Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too) | Uber Engineering Blog https://eng.uber.com/go-explore/ uber-research/go-explore: Code for Go-Explore: a New Approach for Hard-Exploration Problems https://github.com/uber-research/go-explore [1901.10995] Go-Explore: a New Approach for Hard-Exploration Problems https://arxiv.org/abs/1901.10995

or alternatively: Deriving Subgoals Autonomously to Accelerate Learning in Sparse Reward Domains | Proceedings of the AAAI Conference on Artificial Intelligence https://www.aaai.org/ojs/index.php/AAAI/article/view/3876 mchldann / aaai2019 — Bitbucket https://bitbucket.org/mchldann/aaai2019/src/master/

skmp commented 4 years ago

So what are our immediate next steps? python bindings? (poke @gigaherz)

stjordanis commented 4 years ago

Python bindings are needed in order to interop more easily with the libraries above. Similar libs in .NET are not available.

One more related fresh project is: Deep Neuroevolution of Self-Interpretable Agents https://attentionagent.github.io/

skmp commented 4 years ago

What would you need in the python bindings? Can you spec out an API for us?

stjordanis commented 4 years ago

Yes, I will isolate the related API contracts (method signatures) from: openai/gym: A toolkit for developing and comparing reinforcement learning algorithms. https://github.com/openai/gym

openai/retro: Retro Games in Gym https://github.com/openai/retro

and reply ASAP I may also consider some other opensource projects too: https://github.com/uber-research/go-explore https://github.com/vivek3141/super-mario-neat

stjordanis commented 4 years ago

excellent: thu-ml/tianshou: An elegant, flexible, and superfast PyTorch deep Reinforcement Learning platform. https://github.com/thu-ml/tianshou

simple RL game project: uvipen/Tetris-deep-Q-learning-pytorch: Deep Q-learning for playing tetris game https://github.com/uvipen/Tetris-deep-Q-learning-pytorch

stjordanis commented 4 years ago

Gym retro https://github.com/openai/retro is basically using https://www.libretro.com/index.php/api/
"When you choose to use the libretro API, your program gets turned into a single library file (called a ‘libretro core’). A frontend that supports the libretro API can then load that library file and run the app. The frontend’s responsibility is to provide all the implementation-specific details, such as video/audio/input drivers."

This is how a console is modeled, an example: https://github.com/openai/retro/blob/master/cores/genesis.json

stjordanis commented 4 years ago

This is the python API we must abide to: https://retro.readthedocs.io/en/latest/python.html

Retro also uses LUAjit+DynASM in its c++ code components An indicative use: Finding Variables "Score occasionally is stored in individual locations — e.g. if the score displayed is 123400, 1, 2, 3, 4, 0, 0 all will update separately. If the score is broken into multiple variables, make sure you have penalties set for the individual digits (such as BOB-Snes). A number of games will update the score value across multiple frames, in this case you will need a lua script to correct the reward, such as 1942-Nes." ( https://retro.readthedocs.io/en/latest/integration.html )

stjordanis commented 4 years ago

Deepmind's libs are not as convenient as OpenAI's, so for the time being, we will most probably stick to Openai. However, some RL algos from deepmind (as well as the other RL/hyperneat libs mentioned above) could be reused in combination with openai libs. DeepMind’s AI can now play all 57 Atari games—but it’s still not versatile enough - MIT Technology Review https://www.technologyreview.com/f/615429/deepminds-ai-57-atari-games-but-its-still-not-versatile-enough/

deepmind/bsuite: bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent https://github.com/deepmind/bsuite

Agent57: Outperforming the human Atari benchmark | DeepMind https://deepmind.com/blog/article/Agent57-Outperforming-the-human-Atari-benchmark Comment by Gwern.net: ““Agent57: Outperforming the Atari Human Benchmark”, Badia et al 2020 (blog; Agent57 reaches the median human level across ALE—including Pitfall!/Montezuma’s Revenge. It is impressive but still sample-inefficient & uncomfortably baroque in combining what seems like every DM model-free DRL technique in one place: DDQN, Impala, R2D2, Memory Networks, Transformers, Neural Episodic Control, RND, NGU, PBT, MABs… Is model-free DRL a dead end if this is what it takes? I would have preferred to see ALE solved by better exploration in the enormously simpler MuZero.)" ALE: https://github.com/mgbellemare/Arcade-Learning-Environment

Interesting: [1911.08265] Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model https://arxiv.org/abs/1911.08265#deepmind [2002.06038] Never Give Up: Learning Directed Exploration Strategies https://arxiv.org/abs/2002.06038#deepmind

openai/spinningup: An educational resource to help anyone learn deep reinforcement learning. https://github.com/openai/spinningup

stjordanis commented 4 years ago

In the last meeting, it was proposed to test nemco museum and Doom.

Links on RL + Doom or Sonic Hedgehog: An introduction to Deep Q-Learning: let’s play Doom https://www.freecodecamp.org/news/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8/

Diving deeper into Reinforcement Learning with Q-Learning https://www.freecodecamp.org/news/diving-deeper-into-reinforcement-learning-with-q-learning-c18d0db58efe/

An introduction to Deep Q-Learning: let’s play Doom https://www.freecodecamp.org/news/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8/

Improvements in Deep Q Learning: Dueling Double DQN, Prioritized Experience Replay, and fixed… https://www.freecodecamp.org/news/improvements-in-deep-q-learning-dueling-double-dqn-prioritized-experience-replay-and-fixed-58b130cc5682/

An introduction to Policy Gradients with Cartpole and Doom https://www.freecodecamp.org/news/an-introduction-to-policy-gradients-with-cartpole-and-doom-495b5ef2207f/

An intro to Advantage Actor Critic methods: let’s play Sonic the Hedgehog! https://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/

Proximal Policy Optimization (PPO) with Sonic the Hedgehog 2 and 3 https://towardsdatascience.com/proximal-policy-optimization-ppo-with-sonic-the-hedgehog-2-and-3-c9c21dbed5e

Curiosity-Driven Learning made easy Part I - Towards Data Science https://towardsdatascience.com/curiosity-driven-learning-made-easy-part-i-d3e5a2263359

Deep Reinforcement Learning Course https://simoninithomas.github.io/Deep_reinforcement_learning_Course/

Playing DOOM with Deep Reinforcement Learning - James Liang - Medium https://medium.com/@james.liangyy/playing-doom-with-deep-reinforcement-learning-e55ce84e2930

mwydmuch/ViZDoom: Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. https://github.com/mwydmuch/ViZDoom

stjordanis commented 4 years ago

Baekalfen/PyBoy: Game Boy emulator written in Python https://github.com/Baekalfen/PyBoy

Scripts, AI and Bots · Baekalfen/PyBoy Wiki https://github.com/Baekalfen/PyBoy/wiki/Scripts,-AI-and-Bots