Open 1826949 opened 3 years ago
It's been a while since I've seen it but I don't think the videos show the expert, instead they only show the policy trained through DAgger. I think one is a training rollout while another is an evaluation rollout. You can see what they log in the code here, and it looks like one is labeled train_rollouts
and the other is labeled eval_rollouts
. Both are necessary because if the imitation learning agent reaches a state that isn't in the expert training data, then it tends to have a much higher chance to fail.
It's a good idea to look through other parts of the code and understand it because if you end up ever applying what you learn here in another area, you will likely need to use similar infrastructure to what they have provided for you here (e.g. replay buffers, logging, etc.) in addition to implementing the algorithms themselves (I usually find RL algorithms themselves to be quite short in terms of lines of code; infrastructure/data pipelines end up needing many more lines). In addition, they sometimes contain decent starter hyperparameters that you can use on your own projects.
Hope this helps.
Vincent
I've taken a look at the tensorboard and I see what you are talking about now. Unfortunately, I'm not sure but I'll take a look at it and I'll let you know.
Hi Vincent,
In tensorboard, under the "Images" tab, we are able to visualize our results on the Ant or the Humanoid. We see a split screen (e.g. 2 ants), one on the left and one on the right.
How do we know which is the expert and which is ours (DAgger)?
(PS: Where were you able to find this info? I tried googling for hours and could not find it.)
Thanks!