Hi, I really appreciate your work however I have noticed lot of issue/problems.
There is exploration vs exploitation in action in q learning, you are just using greedy action policy which is not optimal solution
The reason why deep qlearning works in because of replay memory and target and estimator network but in the code you have combined both of them and there is not replay memory.
Hi, I really appreciate your work however I have noticed lot of issue/problems.