thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.79k stars 1.12k forks source link

puzzle about policy learning of offline RL algorithms #877

Open GongYanfu opened 1 year ago

GongYanfu commented 1 year ago

If I want to let the agent learn a suboptimal policy rather than optimal one, how should I modify the loss of learn function? for example the learn funtion of discrete_bcq, there are q_loss、i_loss、reg_loss. actually I want to modify the actor loss like BCQPolicy, but I dont find it.

please give me some tips. Thanks a lot.

MischaPanch commented 1 year ago

You could inherit from the policy class and override the learn method