sentenai / reinforce

Reinforcement learning in haskell
https://sentenai.github.io/reinforce/
BSD 3-Clause "New" or "Revised" License
44 stars 17 forks source link

Add eligibility trace variants in algorithms #12

Open stites opened 6 years ago

stites commented 6 years ago

If you're unfamiliar with eligibility traces, they basically unify temporal-difference learning with monte carlo methods -- essentially you hold a buffer in memory of an agent's experience and perform reward discounting across each step's trace. You might also want to check out n-step returns as the inverse of eligibility traces (ie: "looking into the future instead of looking into the past"), although n-step is more compute heavy and, thus, less important. The primary reference to ramp up on this kind of knowledge would be the second revision of the reinforcement learning book, chapter 12 (june draft link, perma link, code).

@mathemage, I think this might be a good series of steps for getting started with this implementation. While I think the first item will get you acquainted with the current code, I think it would be best to hardcode as much as possible for your PR and we can start a feature branch.

2017-09-30-210809_597x354_scrot

stites commented 6 years ago

@mathemage, let me know if you want to take this, this would be a good "intro to RL and haskell" issue

mathemage commented 6 years ago

Hi @stites !

Sorry for the delay, I was overwhelmed with work. Sure, I'd like to take it if it's still available. Can you describe in more details what is it about? Perhaps JIRA issue or at least split into subtasks, what needs to be done, I've got no idea where to begin and what I am supposed to do...

Thanks!

stites commented 6 years ago

Very exciting! You are actually well timed. So there are two sections of algorithms which I think would be nice to include: the foundational RL algorithms, and some more comprehensive deep learning variations. This falls in the first category (and, as I mentioned, is possibly a good way to get started with haskell) -- perhaps later, if you are interested, you can also help out with the next section.

The current plan is to use github for issue management, so we'll just iterate on this ticket. You can also ping me on gitter via the datahaskell group if you want some faster turn-around on any Q&A you might have. I'll fill out more details now and include a checklist for you to work off of - if any of those items get too big; feel free to create a new issue and link this ticket.

Right now, I think the best thing to do might be just downloading the repo, running stack, building the demo in reinforce-zoo, then running the gym server. Maybe submit a PR if you come across any old documentation (I just changed the structure of the repo so there will probably be some bad links). Let me know if you run into trouble!

stites commented 6 years ago

After filling out this ticket, I think the right thing to do is to treat this as your epic. I'm going to c/p each item on this ticket as a new issue, which I'll assign to you, and I'll keep this list updated.

Update: looks like I can't assign you a ticket until you submit your first PR, but #10 is actually the first item on this list

mathemage commented 6 years ago

@stites Cool, I'll get to it when I have more spare time (2 crazy weeks upcoming now). There's no deadline for this, right?

stites commented 6 years ago

haha, there are no deadlines in open source. If someone else wants to take this from you, I'll just send you a ping.