Closed siferati closed 5 years ago
Hello, For the two first questions, what you are looking for is called callback and it is in the documentation ;) (maybe we should add more examples later, see #297 )
I also recommend you to take a look at the rl zoo (dev version for now) for a proper evaluation: https://github.com/araffin/rl-baselines-zoo/blob/261ffcdb7890d343cfc752f15981be16cb31a114/utils/hyperparams_opt.py#L51
This is more of a general gym question, but when does the environment reset?
This only depends on you if you are using a custom gym env. Usually, you reset the env after failure/success or after n steps (even though the last one breaks the markov assumption).
Hey, thanks for the reply!
I understand how I could use the callback to save the model (don't know how I missed it in the documentation) but I still don't get how to train until convergence.
Because the learn
method requires an N amount of timesteps
to be provided, even if I always return true
in the callback, once those N timesteps are reached the learning will stop.
Right now my approach would be to train for N timesteps > evaluate > manually check for convergence > repeat until convergence. Is there no better way to do this?
My idea was to give an almost infinite budget (number of timesteps), so it will exit learning only if the check for convergence in the callback pass (and you can also save the model from time to time in that callback).
Hi @araffin, how do I access the policy gradient loss to save a model mid training? I only see how to access what I find in locals but don't see how to access all of the scalars reported by SB3 to tensorboard.
Hey! So I just found this project today and I'm really excited to try stuff out!
After reading the docs I have 3 questions:
Is it possible to train the model until it converges to some solution (hopefully the optimal solution) instead of training it for a fixed N amount of timesteps?
Is it possible to save the model mid-learning? The examples in the docs only showed saving after the learning phase was already complete. I'd like to be able to pause the learning process and resume it at a later time.
This is more of a general gym question, but when does the environment reset? I believe it resets only when the observed state is out of bounds of the
env.observation_space
- is this correct? So if I wanted to, let's say, reset the environment every 30s, I'd have to observe the elapsed time since the beginning of the current episode and define the observation_space bounds for it as[0, 30]
?Note: If the answers to these questions are algorithm dependent, I'm mostly interested in PPO.
Thanks in advance \^-\^