Hi! First I'd like to give my congrats for this project, I find the implementation of DQNs very clean.
Looking around I see something that could be a bug: the environment is reset at the beginning of training and then I don't see it being reset again. For gymnax environments like Cartpole the 'done' part of the state indicates that the environment needs to be reset. For brax environments this is taken care by the AutoResetWrapper but in most cases you need to reset. Am I missing something?
Hi! First I'd like to give my congrats for this project, I find the implementation of DQNs very clean.
Looking around I see something that could be a bug: the environment is reset at the beginning of training and then I don't see it being reset again. For gymnax environments like Cartpole the 'done' part of the state indicates that the environment needs to be reset. For brax environments this is taken care by the AutoResetWrapper but in most cases you need to reset. Am I missing something?