runner.py --mode retrain policy lost?

jhwangbo / ME491_2022_project

MIT License

0 stars 2 forks source link

runner.py --mode retrain policy lost? #9

Open Ha-JH opened 1 year ago

Ha-JH commented 1 year ago

Whenever I try to run runner.py in retrain mode (even on same exact reward and environment), policy seems to have been reset (maybe because of large learning rate at the beginning of adaptive scheduler?). How can I continue training from a checkpoint? If it is possible to resume from the iteration (now when retrain, iteration begins back from 0) that would be ideal.

SeungHunJeon commented 1 year ago

the learning rate doesn't depend on the retrain mode.

As you see the ppo.py line 120, it is determined from the actor's mean & distribution value.

If your policy takes the previous actor's distribution properly, it doesn't matter.

Please make sure that the policy & value network are importing correctly.

jhwangbo commented 1 year ago

Check this line of code

https://github.com/jhwangbo/ME491_2022_project/blob/b3081db693d24d2bb8510fa54d37599e8c0539b0/raisimGymME491/helper/raisim_gym_helper.py#L53

The mean and std of the observation is computed in this line. You have to pass in the total number of samples that the policy expereinced to get the same bahvior. The default value is 1e5 and, if you change it to a much bigger number, you will see that it performs better.