This shows the way to implement Deep Recurrent Q-Network (DRQN) model for the Cartpole case. I had to expand the state input to include a few number of past state data and created a meaningful sequential input stream for Long and Short-Term Memory (LSTM) model. Otherwise, it did not work with just current state information. This sounds like violating the Markov property assumption but this does the job.
This shows the way to implement Deep Recurrent Q-Network (DRQN) model for the Cartpole case. I had to expand the state input to include a few number of past state data and created a meaningful sequential input stream for Long and Short-Term Memory (LSTM) model. Otherwise, it did not work with just current state information. This sounds like violating the Markov property assumption but this does the job.