Unlike gSDE structured exploration, there is no way for layers with parametric noise NoisyLayer to resample the noise at environment reset. It can only be done after each optimization step, which is not following what is mentioned in the article and introduces bias in training batches.
Solution
For gSDE, there is a special transform specifically for this need gDSENoise, which is made possible because gSDEModel adds its parameters in the rollout TensorDict directly, which is not the case for NoisyLayer. It introduce issues when dealing with multiple workers in parallel since they all rely on the same policy. So I guess the only generic way to handle this situation is to add the action noise as part of the rollout TensorDict, much like what is done for gSDE.
Alternative
Add a hooking mechanism to run generic callbacks at reset of a BaseEnv.
Checklist
[x] I have checked that there is no similar issue in the repo (required)
Motivation
Unlike gSDE structured exploration, there is no way for layers with parametric noise
NoisyLayer
to resample the noise at environment reset. It can only be done after each optimization step, which is not following what is mentioned in the article and introduces bias in training batches.Solution
For gSDE, there is a special transform specifically for this need
gDSENoise
, which is made possible becausegSDEModel
adds its parameters in the rolloutTensorDict
directly, which is not the case forNoisyLayer
. It introduce issues when dealing with multiple workers in parallel since they all rely on the same policy. So I guess the only generic way to handle this situation is to add the action noise as part of the rolloutTensorDict
, much like what is done for gSDE.Alternative
Add a hooking mechanism to run generic callbacks at
reset
of aBaseEnv
.Checklist