Closed anqixu closed 4 years ago
Yes, this seems to be a big problem for the usability of these methods. Here's an example of hyperparam configurations found by tune
on the pybullet
inverse pendulum env:
And here is the actual performance of these hyperparameters when training ten different seeds:
The agent is some n-step Actor-Critic and I used HyperOptSearch
and ASHAScheduler
for early stopping, with 512 trials.
Sorry for the late response; got deprioritized last two weeks but will aim to merge by End of week!
Just wanted to say thanks for this. I am currently experimenting with it.
I also worked on orchestrating manually the launch of separate N
seeds for K
steps from each tune trial, calling get()
on each and doing the mean. This has the advantage of also working with early stopping or other schedulers.
But for some reason ray
stops allocating new processes after the first "batch" of max_concurrent
workers times number of seeds without any errors and it soon stops to a grind without launching the rest of the trials and only running on one or two cores. Hoping to fix this and do a comparison.
@floringogianu that seems like a bug. Can you post a small script for reproducing this?
Feel free to open a new issue.
Not sure if this is already a feature or not, please forgive and provide insight :)
While I haven't tried yet, I understand that tune has support for search algorithms (like BO, spearmint, etc.), which decide on new hparam settings to try based on the performance of previous hparam trials.
It is known that modern RL agents tend to be very sensitive / result-dependent on their initial random number generator (RNG) seed.
I would like to spawn multiple Tune trials of the same hparam setting for an RL run, but with different rng seeds (either explicit or null-implicit). I can do this explicitly already. However, the feature that I don't know if it's implemented / possible is:
P.S.: I heard that maybe num_samples can be used that way, but I'm not sure that's valid since when I used num_samples, the hparams for each trial is sampled independently, rather than simply repeated.
Many thanks!