How to design an actor-critic network with two non-shared LSTMs that take separate inputs?

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.16k stars 725 forks source link

How to design an actor-critic network with two non-shared LSTMs that take separate inputs? #1002

Closed baiydaavi closed 2 years ago

baiydaavi commented 4 years ago

Is it possible to design an actor-critic network with separate LSTMs such that one of the LSTMs outputs value and the other outputs action? The two LSTMs also get different inputs. Thanks for your help in advance.

Miffyli commented 4 years ago

There is no pre-made solution for this and you have to create a custom network for this. If by "separate inputs" you mean different observations, that is not supported per se but you can do tricks like this.

baiydaavi commented 4 years ago

Can I create a custom network without changing what the LSTM cell does? I thought, in stable baselines, the LSTM cell always outputs policy and value. However, I want one LSTM to only compute policy and the other one to only compute the value. Also, by separate inputs I mean a different set of observations for the two LSTMs. I am attaching a cartoon depiction of what I wanna do. It's Supplementary Figure 9 from the nature meta-RL paper. 2-lstm

Miffyli commented 4 years ago

With custom policies you can define a network of your liking, and I think this should be doable, as long you manage to provide the two streams of input data with trickery through the example I linked above. The custom policy example in docs should provide a good starting point.