pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.07k stars 275 forks source link

[Feature Request] Reset parameter noise at environment reset. #1210

Open duburcqa opened 1 year ago

duburcqa commented 1 year ago

Motivation

Unlike gSDE structured exploration, there is no way for layers with parametric noise NoisyLayer to resample the noise at environment reset. It can only be done after each optimization step, which is not following what is mentioned in the article and introduces bias in training batches.

Solution

For gSDE, there is a special transform specifically for this need gDSENoise, which is made possible because gSDEModel adds its parameters in the rollout TensorDict directly, which is not the case for NoisyLayer. It introduce issues when dealing with multiple workers in parallel since they all rely on the same policy. So I guess the only generic way to handle this situation is to add the action noise as part of the rollout TensorDict, much like what is done for gSDE.

Alternative

Add a hooking mechanism to run generic callbacks at reset of a BaseEnv.

Checklist

vmoens commented 1 year ago

Thanks for flagging this. You're right gSDE should have a good review. I'll take care of this asap.