Closed CodingNovice7 closed 2 years ago
max_ep_len is the maximum episode length in the environment and is aligned with other work on gym environments. env_targets are the target returns the model is evaluated on. scale is a normalization hyperparmeter, coarsely chosen so that the rewards would fall somewhere in the range 0-10.
Thank you for sharing. This is a great model, but I don't quite understand some parameters in the code. For example, you are judging the environment env name==’-- ‘the max after that. ep len,env targets and scale parameters and what are their functions.