[Question] Timeline for generalizing Dopamine and policy for contributions towards this.

ramunter commented 6 years ago

Thanks for this great project!

There's been quite a few issues (f.e. #3 #36) regarding customising Dopamine to work on new environments. It also seems like there are quite a few people who have made or are making forks that allow for this.

So I was wondering:

Is there any current work/timeline for generalising Dopamine to new environments/network structures and so on?
What is the policy on accepting contributions towards achieving the above?

psc-g commented 6 years ago

hi, we are aware this is something lots of people would find useful and it's in our pipeline. unfortunately we don't have exact dates as to when this will be ready (we're a small team!). but stay tuned, we'll announce it in the "What's New" section of the main page when it's ready. thanks!

-psc

mgbellemare commented 6 years ago

Thanks for your interest in Dopamine! Like Pablo said, we are working hard to address your PRs / comments.

Given the interest, we'll try to get out clearer guidelines on generalizing Dopamine. One thing that's at the top of our minds is to keep things compact, which we think is one of Dopamine's main strengths -- we'll still discussing how to best do this given the demand. In the meantime, forks are not necessarily a bad thing: Dopamine was exactly designed to be easily modifiable.

Best, Marc

ramunter commented 6 years ago

Great, thanks for the reply. I'll keep an eye on the "Whats new" section :)

M00NSH0T commented 6 years ago

I too am interested in a more generalized version. I was able to hack something together with some help from @pathway (thanks again!), but ultimately the training teetered out before it was able to get to an acceptable / usable policy for my environment, and I've had better luck with other RL implementations as a result. The problem was the network structure. I had to dumb down my environment to work with a much simpler neural network (with a flat observation space, and no aux output for it to use to augment it's training).

I really think the best way to make this generalizable / usable by the greater community as a whole would be to integrate Keras for the network template piece at least. When dealing with numerical environments instead of game / frame based environments, we often need to create more complex multi-input and sometimes multi-output models (which I've found can be especially helpful) using multiple layer types not available in Slim. I get that we could theoretically do this directly with Tensorflow, but it's so much more complicated to do it that way. Keras makes everything so simple. You can still have the loss functions and training done directly in tensorflow, but if the greater codebase was modified to allow for multi-input and multi-output models, and the network definition piece was laid out using Keras layers instead of Slim (as a template for us to modify), I just think this would be far easier for those of us trying to use Reinforcement Learning for no-atari based problems to use. Just my two cents...

pathway commented 6 years ago

@M00NSH0T its possible to use keras layers within tf code if you prefer.

I agree multiple inputs and outputs are challenging. To that end I started trying to using extra_storage_types in the Replay classes but I gave up fighting it after a bit.

M00NSH0T commented 6 years ago

@pathway Yeah I encountered the same issue with the memory. I added another set of variables to it to handle my additional 'aux' output vector, and I have a 'processor' that can chop my observation space up into the right sub arrays for the model inputs. I actually may go back and just tack my aux output onto my observation space and extend my processor to chop that off after it gets sampled for training. That way, I don't need to actually modify the replay_buffercode itself. It's a bit of a hack, but at least I can plug and play future improvement to the greater dopamine code base this way.

Anyway, I'm trying to get all this integrated into my current system, which is a rabbitmq local network cluster with multiple actors feeding into multiple queues that then get processed by dedicated memory processors that read the experiences from each queue, correctly apply the discounting (since they're not mixing up experiences from multiple actors), then spit out priority samples to another shared queue that the central learner grabs from, then dumps the updated weights to a shared folder that the actors periodically use to make updates from. It's kinda over the top, but I've gotten it running before with another RL implementation and it was actually my most successful attempt. I've found this is the best way for me to use 100% of my GPU's potential, which is the only way I'm able to get my environment to finish training in days instead of weeks or months. Plus, the separate memory_buffers can get sampled at different rates and be different sizes and serve as long term or short term memories in that way. Also, having multiple memory programs lets me extend the size of my memory across the resources of my cluster in general, so that's seemed to help considerably. I almost have it working with the dopamine replay_buffer, which I like a lot more than the one I was using before since it only stores one copy of each observation and can use this OutOfGraphPrioritizedReplayBuffer too.

I've heard before you can use keras layers interchangeably with tf layers, but I always seem to have issues when I try. I'll see if I can get it working here though later this week.

google / dopamine

[Question] Timeline for generalizing Dopamine and policy for contributions towards this. #40