Add eligibility trace variants in algorithms

stites commented 6 years ago

If you're unfamiliar with eligibility traces, they basically unify temporal-difference learning with monte carlo methods -- essentially you hold a buffer in memory of an agent's experience and perform reward discounting across each step's trace. You might also want to check out n-step returns as the inverse of eligibility traces (ie: "looking into the future instead of looking into the past"), although n-step is more compute heavy and, thus, less important. The primary reference to ramp up on this kind of knowledge would be the second revision of the reinforcement learning book, chapter 12 (june draft link, perma link, code).

@mathemage, I think this might be a good series of steps for getting started with this implementation. While I think the first item will get you acquainted with the current code, I think it would be best to hardcode as much as possible for your PR and we can start a feature branch.

[ ] Build an example using current code. Use reinforce-algorithms to come up with an example of using the current algorithm interfaces (Reinforce.Algorithms), and the Q-Table "backend" (Reinforce.Agents). This would go into the reinforce-zoo folder and would be a good introduction to current internals. You can open new ticket for this if it takes a long time.
[ ] Decide on, and create a ticket for a function approximator backend. This repository is lacking a function approximator backend (some linear model or a nn-backend). You may have to depend on hmatrix or accelerate (I would suggest looking at accelerate, although hmatrix may be more beginner friendly). There is a written a prototype linear approximator in hmatrix here -- it's BSD3 licensed so feel free to c/p at your convenience. Put this in Reinforce.Agents.<Your backend name>
[ ] Hard-code TD(λ). Currently, reinforce-algorithms is split into algorithms and backends (as you will have figured out through the first item). Ignore this and hard-code a TD(lambda) algorithm in Reinforce/Algorithms/QLearning/EligibilityTrace.hs. You can depend on reinforce-environments* if it makes things easier. Psuedocode for this can be found on page 307 of Barto and Sutton:

2017-09-30-210809_597x354_scrot

[ ] Move on to SARSA(λ). This is where you will need to create a typeclass (or use one of the current ones) and pull out your function-approximator backend. Psuedocode is on page 319.

stites commented 6 years ago

@mathemage, let me know if you want to take this, this would be a good "intro to RL and haskell" issue

mathemage commented 6 years ago

Hi @stites !

Sorry for the delay, I was overwhelmed with work. Sure, I'd like to take it if it's still available. Can you describe in more details what is it about? Perhaps JIRA issue or at least split into subtasks, what needs to be done, I've got no idea where to begin and what I am supposed to do...

Thanks!

stites commented 6 years ago

Very exciting! You are actually well timed. So there are two sections of algorithms which I think would be nice to include: the foundational RL algorithms, and some more comprehensive deep learning variations. This falls in the first category (and, as I mentioned, is possibly a good way to get started with haskell) -- perhaps later, if you are interested, you can also help out with the next section.

The current plan is to use github for issue management, so we'll just iterate on this ticket. You can also ping me on gitter via the datahaskell group if you want some faster turn-around on any Q&A you might have. I'll fill out more details now and include a checklist for you to work off of - if any of those items get too big; feel free to create a new issue and link this ticket.

Right now, I think the best thing to do might be just downloading the repo, running stack, building the demo in reinforce-zoo, then running the gym server. Maybe submit a PR if you come across any old documentation (I just changed the structure of the repo so there will probably be some bad links). Let me know if you run into trouble!

stites commented 6 years ago

After filling out this ticket, I think the right thing to do is to treat this as your epic. I'm going to c/p each item on this ticket as a new issue, which I'll assign to you, and I'll keep this list updated.

Update: looks like I can't assign you a ticket until you submit your first PR, but #10 is actually the first item on this list

mathemage commented 6 years ago

@stites Cool, I'll get to it when I have more spare time (2 crazy weeks upcoming now). There's no deadline for this, right?

stites commented 6 years ago

haha, there are no deadlines in open source. If someone else wants to take this from you, I'll just send you a ping.

sentenai / reinforce

Add eligibility trace variants in algorithms #12