Closed baiydaavi closed 2 years ago
There is no pre-made solution for this and you have to create a custom network for this. If by "separate inputs" you mean different observations, that is not supported per se but you can do tricks like this.
Can I create a custom network without changing what the LSTM cell does? I thought, in stable baselines, the LSTM cell always outputs policy and value. However, I want one LSTM to only compute policy and the other one to only compute the value. Also, by separate inputs I mean a different set of observations for the two LSTMs. I am attaching a cartoon depiction of what I wanna do. It's Supplementary Figure 9 from the nature meta-RL paper.
With custom policies you can define a network of your liking, and I think this should be doable, as long you manage to provide the two streams of input data with trickery through the example I linked above. The custom policy example in docs should provide a good starting point.
Is it possible to design an actor-critic network with separate LSTMs such that one of the LSTMs outputs value and the other outputs action? The two LSTMs also get different inputs. Thanks for your help in advance.