puzzle about policy learning of offline RL algorithms

[ ] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [ ] new feature request
[x] I have visited the source website
[x] I have searched through the issue tracker for duplicates

[x] I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

If I want to let the agent learn a suboptimal policy rather than optimal one, how should I modify the loss of learn function? for example the learn funtion of discrete_bcq, there are q_loss、i_loss、reg_loss. actually I want to modify the actor loss like BCQPolicy, but I dont find it.

please give me some tips. Thanks a lot.

thu-ml / tianshou

puzzle about policy learning of offline RL algorithms #877