I found big bug in our learning task implementation. Now, we decays epsilon after each agent's step what causes that if falls really fast to 0.
We should fix this, and what is more painful is the fact that whe should repeat our experiments with DQN Agent.
I found big bug in our learning task implementation. Now, we decays epsilon after each agent's step what causes that if falls really fast to 0. We should fix this, and what is more painful is the fact that whe should repeat our experiments with DQN Agent.