Missing of Regularization of GCL

[ ] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] system worker bug
- [ ] system utils bug
- [ ] code design/refactor
- [ ] documentation request
- [+] new feature request
[+] I have visited the readme and doc
[+] I have searched through the issue tracker and pr tracker

[+] I have mentioned version numbers, operating system and environment, where applicable:

import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)

>>> print(ding.__version__, torch.__version__, sys.version, sys.platform)
v0.4.6 1.10.0 3.7.11 (default, Jul 27 2021, 14:32:16) 
[GCC 7.5.0] linux

Dear Developers,

I am looking into the implementation of guided cost reward model. In the training process there is only the loss of IOC but not the regularization term g_lcr and g_mono. Do I miss that or is it just not implemented in the code?

In addition, in the paper of guided cost learning, their Loss_Ioc consider different trajectories, each of which should be a complete episode. However, in DI-ENGINE, the training data consists of time-steps sampled from tracjectories, which means the time-steps in the training data are not from a complete episode and might also have repeated time-steps duiring sample. Is this designed on purpose or any misunderstanding of the paper?

Best regards Zhixiong

opendilab / DI-engine

Missing of Regularization of GCL #627