Open vlad17 opened 7 years ago
Since there's only a couple of relevant pong values (#15), we should see how PPO handles them on a GPU. After samvit merges #11, you can edit that ppo-training file to include a --ram flag or something, and use an MLP policy in that case. We'll see how that does.
@mwhittaker is this task clear? Basically just train on RAM pong and save the learning curve. Two things to mention:
(1) I think openai has a built-in mlp_policy (analogous to their cnn_policy) (2) you might want to change gen_pong_ram in our atari_env.py to have a hand-rolled version of FrameStack for ram input (FrameStack itself assumes images) so that your policy can use the last 4 instances of ram (just a thing to try; not saying it'll be better).
Makes sense! I'll work on it :)
Deliverable: learning curve graph of Pong on 1) video, 2) RAM, 3) cherry-picked RAM
Blocked on #30
Talk to richard - why didn't he just look up ram location for ball in Atari vs coding up your own.
Choose path of least resistance - whatever is easiest for hard-coded pong.
Deliverables (choose DQN or something that can be dropped in as aplaceholder): -> nice graph for hardcoded pong (e hardcoded)