Open stianteien opened 1 year ago
Higher duration gives more reward. Is this a problem?
Higher duration gives more reward. Is this a problem?
Should not be a problem for the learning because it learns for every step. It is a problem for the overall score.
Alerting a lot when reaching the end.
Alerting a lot when reaching the end.
Ok fixed
Find a way to fast-stop the train reached the end
Now it is done when the first train is hitting the end. Then the second train (with the only brain) will never reach the end..