The DQN experiment runs extremely slow on crime task.
I notice that for two different tasks, we have exactly the same convergence condition, i.e. standard deviation of quality measure is less than 3. However, the quality metric for crime is in the range of 300-1400, while the quality metric for house price is in the range of 10 - 30. Clearly, the house price will quickly satisfies this condition and stop. Meanwhile, the crime task could run forever. I'll try to change the condition as std is 1.5% of average quality measure. Hope that will fix the issue.
The DQN experiment runs extremely slow on crime task.
I notice that for two different tasks, we have exactly the same convergence condition, i.e. standard deviation of quality measure is less than 3. However, the quality metric for crime is in the range of 300-1400, while the quality metric for house price is in the range of 10 - 30. Clearly, the house price will quickly satisfies this condition and stop. Meanwhile, the crime task could run forever. I'll try to change the condition as std is 1.5% of average quality measure. Hope that will fix the issue.