I am using the starter-agent to train a gym environment (different from pong and dusk-drive), it was able to learn to a certain degree but then got stuck around a certain (sub-optimal) score. I get the feeling that it stopped exploring too quickly, is there any way/how can I adjust the agent so that the explore-exploit rate is different? Also, is there any documentation/paper on the model or LSTM architecture itself that was used in this implementation? (e.g. how many layers are used, the type of layers)
Hi,
I am using the starter-agent to train a gym environment (different from pong and dusk-drive), it was able to learn to a certain degree but then got stuck around a certain (sub-optimal) score. I get the feeling that it stopped exploring too quickly, is there any way/how can I adjust the agent so that the explore-exploit rate is different? Also, is there any documentation/paper on the model or LSTM architecture itself that was used in this implementation? (e.g. how many layers are used, the type of layers)
Thanks