Question on HW1 - Githubissues

Hi, I quite agree with you.

Thanks for the author's great code for CS285 2020Fall homework!

There is a small problem in hw1.

In hw1 cs285/policies/MLP_policy.py, the author used the deterministic policy (directly through self.mean_tet to output actions).

This is incorrect in that we can see that self.logstd is set in the original code cs285/policies/MLP_policy.py, which is part of the stochastic policy.

In addition, I found that after modifying the author's code from deterministic policy to stochastic policy, the performance of BC in Ant -v2 is reduced from 4k to 1.4k.

I think 1.4k is what BC should perform.

vincentkslim / cs285_homework_fall2020

Question on HW1 #4