Closed rllyryan closed 1 year ago
Wrap this wrapper along with the env_id and n_envs to the make_vec_env() function.
looks right but you could even implement the action masking method directly in your env (see documentation or code to know what is the expected name of the method).
Could I ask what is the purpose of the env = VecNormalize(env, norm_reward=False)? I
we have several issues about that in SB3 (and I talk a bit about that in the RL Tips and Tricks). If your observation is already normalized, it is not needed.
does the function make calls for the action mask to be returned at each timestep? My environment has different invalid actions at different timesteps depending on the previous valid action taken.
yes
Hi repository owners @araffin @vwxyzjn!
I would like to ask you guys a few questions regarding the flow of the setup of the environment in your train_ppo.py script.
Let me paste it here for ease of reference:
Firstly, I would like to thank you guys for having the only example that I have found online with both vectorising and masking done in the environment setup stage. SB3 is lacking some documentation in the sb3-contrib department.
Question(s)
env_id
andn_envs
to themake_vec_env()
function.env = VecNormalize(env, norm_reward=False)
? If I have manually normalised my inputs to between [-1,1] (due to the observation being a concatenation of different parts, for i.e., animal weights, animal heights, and etc.), it is still suggested to use this wrapper? Can the environment still work without this wrapper or is it advisable to use it?model.learn( ... , use_mask = mask)
step, does the function make calls for the action mask to be returned at each timestep? My environment has different invalid actions at different timesteps depending on the previous valid action taken.Thank you for your time in addressing my questions!