mwhittaker / deeprl_project

Deep RL Final Project
1 stars 1 forks source link

Hard-coded Pong #3

Open vlad17 opened 6 years ago

vlad17 commented 6 years ago

Talk to richard - why didn't he just look up ram location for ball in Atari vs coding up your own.

Choose path of least resistance - whatever is easiest for hard-coded pong.

Deliverables (choose DQN or something that can be dropped in as aplaceholder): -> nice graph for hardcoded pong (e hardcoded)

vlad17 commented 6 years ago

Since there's only a couple of relevant pong values (#15), we should see how PPO handles them on a GPU. After samvit merges #11, you can edit that ppo-training file to include a --ram flag or something, and use an MLP policy in that case. We'll see how that does.

vlad17 commented 6 years ago

@mwhittaker is this task clear? Basically just train on RAM pong and save the learning curve. Two things to mention:

(1) I think openai has a built-in mlp_policy (analogous to their cnn_policy) (2) you might want to change gen_pong_ram in our atari_env.py to have a hand-rolled version of FrameStack for ram input (FrameStack itself assumes images) so that your policy can use the last 4 instances of ram (just a thing to try; not saying it'll be better).

mwhittaker commented 6 years ago

Makes sense! I'll work on it :)

SamvitJ commented 6 years ago

Deliverable: learning curve graph of Pong on 1) video, 2) RAM, 3) cherry-picked RAM

vlad17 commented 6 years ago

Blocked on #30