Closed Hsgngr closed 3 years ago
Sorry it's impossible to know whether the problem is in the environment, or that the network cannot learn a good policy.
A few tips to debug it, is only collect data before start training and make sure the training loss can go down.
I have tf-environment for trading where I have 3 actions: Skip | Buy | Sell. For training part I'm following the DQN Agent. When I calculate average_return with random_policy I'm getting results like this which make sense:
However when I use agent.policy it only takes action 0 which is Skip action, therefore after one episode it prints like this:
I started to use tf agents recently, I found the code pretty easy to understand however there are lack of documentation and tutorials.
Since I know that random_policy works and creates meaningful results there must be something off with the agent.policy part. I'm adding the notebook here: DQN Training Notebook
As you can see the loss is so huge therefore after some time it will throw an error for inf or Nan value for the loss.
You can also see the code blocks below (same as notebook if you cannot open it) Here is the code block that I'm using for my training part:
As you can see the loss is so huge therefore after some time it will throw an error for inf or Nan value for the loss.