weidler / RLaSpa

Reinforcement Learning in Latent Space
MIT License
5 stars 1 forks source link

Develop a better metric for the progress report #37

Closed weidler closed 5 years ago

weidler commented 5 years ago

Especially in the racing tasks where one mistake means a loss, we need a better way to indicate learning progress than episode length, because the exploration is too destructive.

HansBambel commented 5 years ago

I diagree. He has to learn to evade obstacles. If he does that he gets a reward. This should be sufficient to learn.

weidler commented 5 years ago

Sure, but I am not talking about learning. I am talking about the progress reports we do during learning:

|-- 91% (Avg. Rew. of 2265.0) |-- 92% (Avg. Rew. of 3503.5) |-- 93% (Avg. Rew. of 3541.6666666666665) |-- 94% (Avg. Rew. of 2651.0) |-- 95% (Avg. Rew. of 4307.666666666667) |-- 96% (Avg. Rew. of 3405.0) |-- 97% (Avg. Rew. of 3523.3333333333335) |-- 98% (Avg. Rew. of 3034.8333333333335) |-- 99% (Avg. Rew. of 4230.666666666667)

This looks relatively random or at least not very good, although the policy is already very good. This is why I believe that the average reward from the last episodes during training with exploration is not very indicative here. The loss of the DQN may be more helpful. Or some intermediate test-episodes.

HansBambel commented 5 years ago

Oh, misunderstanding then. I agree. More info is better.

HansBambel commented 5 years ago

Info of prints now show avg repr loss, policy loss, latest rewards and time elapsed. What else do you think could be useful?

weidler commented 5 years ago

I would say thats fine now :+1: