number9473 / challenge

0 stars 0 forks source link

OpenAI Retro Contest #9

Open joyhuang9473 opened 6 years ago

joyhuang9473 commented 6 years ago

OpenAI Retro Contest

joyhuang9473 commented 6 years ago

[OpenAI Retro Contest] Getting Started https://medium.com/@deankayton/openai-retro-contest-getting-started-62a9e5cc3801

joyhuang9473 commented 6 years ago

Day one of the OpenAI Retro Contest. https://medium.com/@tristansokol/day-one-of-the-openai-retro-contest-1651ddcd6aa5

joyhuang9473 commented 6 years ago

(reward function) More information in the rewards section of this:

https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/retro-contest/gotta_learn_fast_report.pdf

joyhuang9473 commented 6 years ago

OpenAI Retro Contest – Everything I know about JERK agent http://www.noob-programmer.com/openai-retro-contest/jerk-agent-algorithm/

joyhuang9473 commented 6 years ago

SonLVL (Pallets, Tiles, Blocks, Chunks, Solidity Maps, Foreground, Background)

joyhuang9473 commented 6 years ago

@PixelyIon

The stuff of Blocks, Solidity Maps, Tiles: https://cdn.discordapp.com/attachments/433354549835595776/444272473702137856/Sanic.zip

joyhuang9473 commented 6 years ago

@Garreth

What I understand is (Tell me if any of the following is wrong):
- Reward function cannot be changed
- The main purpose of the contest is to find a way that generalize well and fast (that's why we don't have access to info dictionary in test)
- JERK is simple but achieve better result at early stage (because it is guided)
- POO and Rainbow are slower, but should eventually get better result than JERK if we train more
- The main problem is generalization

What I think we should do is finding more useful features like @◱ PixelyIon is trying to do.
joyhuang9473 commented 6 years ago

train set: https://contest.openai.com/static/sonic-train.csv test set: SpringYardZone.Act1

joyhuang9473 commented 6 years ago

@sl

ConvVAE for RL

score: 5302.79

example: https://keon.io/deep-q-learning/

joyhuang9473 commented 6 years ago

@unixpickle

@m1234d as it is, Rainbow performs better than PPO. I don't think either algorithm is the best for this task, especially since they are very much worse than humans (so humans represent at least one better algorithm)
joyhuang9473 commented 6 years ago

@lyons Wrote a quick post about using retrowrapper to run multiple environments at once

https://mikelyons.org/2018/05/22/Multiple-Retro-Environments.html

joyhuang9473 commented 6 years ago

@lyons running retro in colab https://drive.google.com/file/d/11Mxg30mXEvhk8jB0iJ-cFw1k0wICkf8e/view?usp=sharing

joyhuang9473 commented 6 years ago

@Rezix @sulo If it helps i made a big write-up based on Jek-Agent: http://www.noob-programmer.com/openai-retro-contest/jerk-agent-algorithm/

joyhuang9473 commented 6 years ago

@tristansokol OpenAI Retro Contest tools! https://gist.github.com/tristansokol/062b1d509e2e8e6e250a30ae09928a58

joyhuang9473 commented 6 years ago

@Sugaku Retro Games in Gym: https://github.com/rfurman/retro

joyhuang9473 commented 6 years ago

@Daniel

[writeups] Using Deep Reinforcement Learning to Play Sonic the Hedgehog: An attempt to replicate the World Models paper to play Sonic for the OpenAI Retro Contest.

https://medium.com/@mrdbourke/the-world-model-of-a-hedgehog-6ff056a6dc7f

joyhuang9473 commented 6 years ago

@flyyufelix

[writeups] Train a Reinforcement Learning agent to play custom levels of Sonic the Hedgehog with Transfer Learning https://flyyufelix.github.io/2018/06/11/sonic-rl.html

joyhuang9473 commented 6 years ago

@Dylan

[writeups] World Models applied to Sonic

joyhuang9473 commented 6 years ago

OpenAI Retro Contest – Compilation of Reinforcement Learning Write-Ups http://www.noob-programmer.com/openai-retro-contest/reinforcement-learning-write-ups/

joyhuang9473 commented 6 years ago

@seungjaeryanlee

https://github.com/seungjaeryanlee/retro-agents

joyhuang9473 commented 6 years ago

Best Write-ups

RANK WINNER WRITEUP
#1 Dylan Djian World Models
#2 Oleg Mürk Exploration algorithms, policy distillation and fine-tuning
#3 Felix Yu Fine-tuning on per-zone expert policies