If I want to let the agent learn a suboptimal policy rather than optimal one, how should I modify the loss of learn function?
for example the learn funtion of discrete_bcq, there are q_loss、i_loss、reg_loss.
actually I want to modify the actor loss like BCQPolicy, but I dont find it.
If I want to let the agent learn a suboptimal policy rather than optimal one, how should I modify the loss of learn function? for example the learn funtion of discrete_bcq, there are q_loss、i_loss、reg_loss. actually I want to modify the actor loss like BCQPolicy, but I dont find it.
please give me some tips. Thanks a lot.