MDPDataBunch (and Dataset, Item, and ItemList)

Discussion I will be moving a lot of the text in the README regarding DataBunches into here to be more constructive/interactive once the basic requirements of the repository are met. Current goals of the data pipeline are:

[ ] Increase num_workers to more than 0. Presently, the dataset class crashes when doing parallel computing obviously due to sharing a single environment... Is this going to ever be possible especially for agents like DQNs?
[ ] With parallel computing in mind, will there require major changes if we try implementing HAC or A3C?
[ ] Is there a way to make this code more pythonic? Current code seems rigid. What would happen if we wanted to add a new Item such as a SemiMDPSlice? What if we added agents that use Options?
[ ] The dataset class forces purely sequential access. Perhaps investigate ways to make this cleaner for different samplers? Need to consider how the DataLoader class treats objects with __getitem__.

Most Important

[x] Memory management: Not sure this was going to be such an immediate issue... but the memory management in MDP datasets is horrific. It grows by 100-200 mb every 20 steps in the dqn notebook for the gym_maze. No problem. Moving to options for reducing the size of the datasets. What we will do is "null out" unimportant episodes with variables (state and image based fields most likely) to reduce the memory size. We can keep reward information. We want to be able to keep certain episodes of interest for the interpreter to work with. Maybe in the future we can try a harddrive caching scheme??? Maybe thats a bad idea...

Proposing:

keep high fidelity k top episodes
keep quartile worst best episodes
keep k top worst and best
keep k top worst
None, only load into memory (always keep first)
all / small.
[x] How are we going to delineate between an epoch, a step, and a batch? At present a single iteration through an episode is an epoch. Both a single step and a batch are treated as the same think being a single frame in the environment. How do we plan to separate this?

josiahls / fast-reinforcement-learning

MDPDataBunch (and Dataset, Item, and ItemList) #1