xbpeng / awr

Implementation of advantage-weighted regression.
MIT License
176 stars 37 forks source link

Offline version of AWR #5

Open FineArtz opened 3 years ago

FineArtz commented 3 years ago

Hi, I am trying to modify AWR into the offline version (or fully off-policy version). I find that the paper states that one can simply treat the dataset as the replay buffer and don't need to do any modifications. But I notice that if I remove sampling in rl_agent.train, line 105 in rl_agent.py: train_return, train_path_count, new_sample_count = self._rollout_train(self._samples_per_iter), new_sample_count will remain 0, so that update steps are also 0.

Would you like to point out a proper way of modifications to obtain the offline AWR?

xbpeng commented 3 years ago

you can just change the code so that the number of update steps do not depend on new_sample_count, like setting it to a constant: https://github.com/xbpeng/awr/blob/831442fb8d4c24bd200667cbc5e458c7657effc2/learning/awr_agent.py#L225