Closed hlsafin closed 2 years ago
I could be wrong about this, but looking at the implementation, it doesn't seem like it's taking in the previous reward alongside state and prev action into the LSTM, no? Was this a design decision?
I could be wrong about this, but looking at the implementation, it doesn't seem like it's taking in the previous reward alongside state and prev action into the LSTM, no? Was this a design decision?