Open gehring opened 5 years ago
The second case seems to be a little odd, and I think perhaps the best thing to do here is just to tone down the documentation.
You can achieve what you intend to do without nesting the grid search right?
import ray
from ray import tune
def dummy_fn(config, reporter):
b2 = config["b"] ** config["a"]
print(config)
exp_config = {
"dummy_exp": {
"run": dummy_fn,
"config": {"a": tune.grid_search([1, 2]),
"b": tune.grid_search([2,3,4])},
},
}
ray.init()
tune.run_experiments(exp_config)
I meant those examples to be simple, minimal uses of nested grid search. The actual use case that lead me to this issue involves the desire to search different parts of the space based on other parameters. This can occur when some variable a
alters the scale or meaning of other variables. Specifically, I was trying to use it to pick different values for the learning rate to search over based on which optimizer is currently being tried, e.g.:
import numpy as np
import ray
from ray import tune
# Mapping from optimizer and batch size to learning rate range to explore.
LR_RANGES = {
("sgd", 1): (0.0001, 0.01),
("sgd", 100): (0.001, 0.1),
("adam", 1): (0.00001, 0.01),
("adam", 100): (0.0001, 0.1),
}
def dummy_fn(config, reporter):
print(config)
# Generate a grid search for the learning rate for a given method and
# batch size.
def resolve_learning_rate(spec):
method = spec.config.method
batch_size = spec.config.batch_size
lr_range = LR_RANGES((method, batch_size))
values = np.geomspace(*lr_range)
return tune.grid_search(values)
exp_config = {
"dummy_exp": {
"run": dummy_fn,
"config": {
"method": tune.grid_search(["sgd", "adam"]),
"batch_size": tune.grid_search([1, 100]),
"learning_rate": resolve_learning_rate,
},
},
}
ray.init()
tune.run_experiments(exp_config)
Granted, since, in this example, the ranges themselves don't depend on the parameters, we could work around it by splitting the experiment into smaller ones, and then doing our own grid search. However, it is possible for a situation to arise where we would want the range for some hyper-parameter, e.g., step size, trusted region, to depend on the scale of some randomly sampled hyper-parameter, e.g., RBF width, rewards clipped at some value. I'm not sure this has a simple workaround beyond writing a custom SearchAlgorithm
class.
Additionally, Introducing a new parameters who's semantics are dictated by the need for a workaround is somewhat undesirable. Using your proposed workaround to illustrate why this might be the case, while this solution works fine, we've now introduced a new parameter whose meaning is dictated by implementation restrictions rather than from more stable/abstract concepts. This makes understanding what is happening less clear and more prone to bugs as the experiment is forked/modified. Additionally, workaround like these are likely to be omitted from any technical write up).
That being said, I don't mean to overplay the importances of this feature. I think either removing problematic documentation or fixing the nested grid_search functionality can be easily argued for, and are both acceptable solutions. On one hand, simply removing support/mention of the feature is easy and would help keep tune lean, while, on the other, fixing the feature would allow tune users to keep some types of grid search config clean and easy to read.
However, I am still concerned with how the issue manifests itself. Given how inaccurate the error message is, it would probably be wise to make sure there isn't some more fundamental issue at play as it doesn't appear like this is where the code assumes such an error would occur. I feel like fixing this would is likely to lead to an easy fix allowing nested grid search to work properly. Maybe someone familiar with this code could look into it and identify where the broken assumption lies.
IMO, whether removing or fixing the nesting feature is best depends on how difficult it would be to fix the feature. I do think there is value is supporting it but, given it only affect some rare cases, many of which have acceptable workaround, it is not worth bloating tune and, therefore, increasing tune's maintenance cost. Though, even if the feature is removed, the way the error is generated hints that there are some (potentially benign) issues with the assumed behavior of some of the private _resolve*
functions that should be looked into.
Edits: typo fixes
I've been confused by the same error myself when I tried running nested grid search. I would also benefit from nested grid_search
support, however, it's not a very high priority for me since so far I've always managed to work around it.
I've recently had several cases where I would've benefitted from being able to nest the lambdas in variant spec. I can still work around this by creating python conditions outside of the variant spec definition, but it's slowly starting to make my variant definitions more complicated.
Here's a bit simplified example from the latest one for reference:
from pprint import pprint
from ray import tune
from ray.tune.suggest.variant_generator import generate_variants
def metric_learner_kwargs(spec):
spec = spec.get('config', spec)
shared_kwargs = {
'learning_rate': 3e-4,
'hidden_layer_sizes': (256, 256),
}
if spec['metric_learner_params']['type'] == 'OnPolicyMetricLearner':
kwargs = {
**shared_kwargs,
**{
'train_every_n_steps': 128,
}
}
elif spec['metric_learner_params']['type'] == 'HingeMetricLearner':
kwargs = {
**shared_kwargs,
**{
'train_every_n_steps': 1,
'lambda_estimator_kwargs': {},
'max_distance': tune.sample_from(lambda spec: (
10 + spec.get('config')['max_path_length']
))
}
}
return kwargs
spec = {
'max_length': 100,
'metric_learner_params': {
'type': tune.grid_search([
'OnPolicyMetricLearner',
'HingeMetricLearner'
]),
'kwargs': tune.sample_from(metric_learner_kwargs)
}
}
pprint(list(generate_variants(spec)))
Edit: Here's even simpler example covering the same case:
spec = {
'test1': tune.grid_search(['a', 'b']),
'test2': tune.grid_search(['c', 'd']),
'test3': tune.sample_from(lambda spec: (
{
'a': tune.sample_from(lambda spec: spec['test2'] * 2),
'b': tune.sample_from(lambda spec: spec['test2'] * 3),
}[spec['test1']]
))
}
The following code works in the latest master:
tune.run(
training_function,
config={
'test1': tune.grid_search(['a', 'b']),
'test2': tune.grid_search(['c', 'd']),
'test3': tune.sample_from(lambda spec: {
'a': tune.sample_from(lambda spec: spec.config['test2'] * 2),
'b': tune.sample_from(lambda spec: spec.config['test2'] * 3),
}[spec.config['test1']])
})
If there's still cases we don't consider, please open a new issue. Thanks!
Nice, love it, thanks @krfricke!
@krfricke This issue was for nesting grid_search
inside a lambda function (see above for some use cases). Is this now supported in master?
Ah, this looks a bit more complicated. I'll look into this again later. let me re-open this for now until I checked it.
The main problem why we currently don't support this is that we currently require to know the number of samples in advance. However, with dependent grid search spaces, the number of grid search elements can vary. Even if we introduce a method to specify the number of elements and fix it, resolving the grid search variables would required some significant overhaul of our current search space resolution algorithm.
We can keep this issue open to track this request, but it's unlikely we'll get to this soon.
Also interested in nesting grid_search
inside a lambda function:
config = {
"optimizer" : tune.grid_search(["AdamW", "SGDW"]),
"scheduler" : tune.grid_search(["none", "cosine", "warm_restarts", "plateau"]),
"lr" : tune.uniform(1e-7, 1e-3),
"weight_decay" : tune.uniform(0, 1e-1),
"nesterov" : lambda spec: tune.grid_search([True, False]) if spec.config.optimizer == "SGDW" else None,
"momentum" : lambda spec: tune.uniform(0, 1) if spec.config.optimizer == "SGDW" else None,
"dampening" : lambda spec: None if spec.config.nesterov is None else spec.config.nesterov*np.random.uniform(),
"T_0" : lambda spec: tune.grid_search([5, 10]) if spec.config.scheduler == "warm_restarts" else None,
"T_mult" : lambda spec: tune.grid_search([1, 2]) if spec.config.scheduler == "warm_restarts" else None,
"factor" : tune.sample_from(lambda spec: (spec.config.scheduler == "plateau") * np.random.uniform())
}
Alternatively, this sort of nesting:
config = {
"optimizer" : tune.grid_search([
{"SGDW": {
"lr" : tune.uniform(1e-7, 1e-3),
"weight_decay" : tune.uniform(0, 1e-1),
"nesterov" : tune.grid_search([True, False]),
"momentum" : tune.uniform(0, 1),
"dampening" : tune.sample_from(lambda spec: spec.config.optimizer.SGDW.nesterov * np.random.uniform())}},
{"AdamW" : {
"lr" : tune.uniform(1e-7, 1e-3),
"weight_decay" : tune.uniform(0, 1e-1)}}]),
"scheduler" : tune.grid_search([
{"none" : None},
{"cosine" : {
"T_max" : set_mod.NUM_EPOCHS}},
{"warm_restarts" : {
"T_0" : tune.grid_search([5, 10]),
"T_mult" : tune.grid_search([1, 2])}},
{"plateau" : {
"factor" : tune.uniform(0, 1),
"patience" : 10}}])
}
Gave up on conditioning, specified space according to HyperOpt protocol (and now it works):
lr = [5e-7, 1e-3]
weight_decay = [0, 1e-3]
T_0 = [5, 10]
T_mult = [1, 2]
space = {
"optimization" : hp.choice("optimization",
[{"name" : "AdamW_none",
"lr" : hp.loguniform('lr_1', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_1', weight_decay[0], weight_decay[1])},
{"name" : "AdamW_cosine",
"lr" : hp.loguniform('lr_2', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_2', weight_decay[0], weight_decay[1])},
{"name" : "AdamW_restarts",
"lr" : hp.loguniform('lr_3', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_3', weight_decay[0], weight_decay[1]),
"T_0" : hp.choice("T_0_1", T_0),
"T_mult" : hp.choice("T_mult_1", T_mult)},
{"name" : "AdamW_plateau",
"lr" : hp.loguniform('lr_4', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_4', weight_decay[0], weight_decay[1]),
"factor" : hp.uniform("factor_1", 0, 1)},
{"name" : "SGDW_none",
"lr" : hp.loguniform('lr_5', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_5', weight_decay[0], weight_decay[1]),
"momentum" : hp.uniform("momentum_1", 0, 1),
"dampening" : hp.uniform("dampening_1", 0, 1)},
{"name" : "SGDW_cosine",
"lr" : hp.loguniform('lr_6', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_6', weight_decay[0], weight_decay[1]),
"momentum" : hp.uniform("momentum_2", 0, 1),
"dampening" : hp.uniform("dampening_2", 0, 1)},
{"name" : "SGDW_restarts",
"lr" : hp.loguniform('lr_7', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_7', weight_decay[0], weight_decay[1]),
"momentum" : hp.uniform("momentum_3", 0, 1),
"dampening" : hp.uniform("dampening_3", 0, 1),
"T_0" : hp.choice("T_0_2", T_0),
"T_mult" : hp.choice("T_mult_2", T_mult)},
{"name" : "SGDW_plateau",
"lr" : hp.loguniform('lr_8', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_8', weight_decay[0], weight_decay[1]),
"momentum" : hp.uniform("momentum_4", 0, 1),
"dampening" : hp.uniform("dampening_4", 0, 1),
"factor" : hp.uniform("factor_2", 0, 1)},
{"name" : "SGDW_none_nesterov",
"lr" : hp.loguniform('lr_9', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_9', weight_decay[0], weight_decay[1]),
"momentum" : hp.uniform("momentum_5", 0, 1)},
{"name" : "SGDW_cosine_nesterov",
"lr" : hp.loguniform('lr_10', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_10', weight_decay[0], weight_decay[1]),
"momentum" : hp.uniform("momentum_6", 0, 1)},
{"name" : "SGDW_restarts_nesterov",
"lr" : hp.loguniform('lr_11', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_11', weight_decay[0], weight_decay[1]),
"momentum" : hp.uniform("momentum_7", 0, 1),
"T_0" : hp.choice("T_0_3", T_0),
"T_mult" : hp.choice("T_mult_3", T_mult)},
{"name" : "SGDW_plateau_nesterov",
"lr" : hp.loguniform('lr_12', np.log(lr[0]), np.log(lr[1])),
"weight_decay" : hp.uniform('weight_decay_12', weight_decay[0], weight_decay[1]),
"momentum" : hp.uniform("momentum_8", 0, 1),
"factor" : hp.uniform("factor_3", 0, 1)}
])
}
@krfricke how do you describe the nested parameter name in parameter_columns
in reporter?
config = mycustomconfig() # nested DictConfig for the PL model
config['model']['aggregation']['out_channels'] = tune.choice([48, 96]) config['data']['loader']['train_len'] = tune.choice([5000, 50000, 2500])
...
reporter = CLIReporter( parameter_columns=["out_channels", "train_len"], # ???? what to write will this suffice? metric_columns=["val_loss", "epoch"])
Can you try using /
as separators, e.g. model/aggregation/out_channels
?
System information
Describe the problem
The documentation for
generate_variants(...)
states that unresolved parameterslambda
andgrid_search
can be nested. From ray/tune/suggest/variant_generator.py, line 33:"""It is also possible to nest the two, e.g. have a lambda function return a grid search or vice versa, as long as there are no cyclic dependencies between unresolved values. """
However, it seems like even the following simple case fails:
with error message (raised from ray/tune/suggest/variant_generator.py, line 129):
Additionally, the error handling seems off as the error message appears incorrect in blaming dependencies. This example fails for me in the same way as the other case despite having no dependencies: