Open EgorKraevTransferwise opened 1 year ago
The search space dict format returned by each estimator's search_space
function is different from that in flaml.tune
. https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#search-space
In short, each hyperparameter's value is also a dict which needs to contain "domain" (required), "init_value" (optional) and "low_cost_init_value" (optional) as keys. The value for "domain" is the same as the value in the space dict passed to flaml.tune
.
Thanks!
Changing the last line of the for-loop above to out[mdl] = {"domain": tune.choice(est_cfgs)}
I get another error:
Can the syntax you use for estimators not deal with nested/hierarchical search spaces in the same way that flaml.tune
does?
Also below see the screenshot from the debugger of what the complete search space looks like.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\..\flaml\automl\automl.py:1814: in fit
self._search()
..\..\flaml\automl\automl.py:2378: in _search
self._search_sequential()
..\..\flaml\automl\automl.py:2201: in _search_sequential
use_ray=False,
..\..\flaml\tune\tune.py:502: in run
trial_to_run = _runner.step()
..\..\flaml\tune\trial_runner.py:125: in step
config = self._search_alg.suggest(trial_id)
..\..\flaml\searcher\suggestion.py:213: in suggest
suggestion = self.searcher.suggest(trial_id)
..\..\flaml\searcher\blendsearch.py:1057: in suggest
return super().suggest(trial_id)
..\..\flaml\searcher\blendsearch.py:747: in suggest
init_config, self._ls_bound_min, self._ls_bound_max
..\..\flaml\searcher\flow2.py:242: in complete_config
partial_config, self.space, self, disturb, lower, upper
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
partial_config = {'monthly_fourier_degree': 2}
space = {'model_hi': <flaml.tune.sample.Categorical object at 0x00000187CF3970C8>, 'model_lo': <flaml.tune.sample.Categorical object at 0x00000187CF391808>, 'monthly_fourier_degree': <flaml.tune.sample.Integer object at 0x00000187CF397148>}
flow2 = <flaml.searcher.flow2.FLOW2 object at 0x00000187CF391FC8>, disturb = 0
lower = {'monthly_fourier_degree': 0.14285714285714285}
upper = {'monthly_fourier_degree': 0.14285714285714285}
def complete_config(
partial_config: Dict,
space: Dict,
flow2,
disturb: bool = False,
lower: Optional[Dict] = None,
upper: Optional[Dict] = None,
) -> Tuple[Dict, Dict]:
"""Complete partial config in space.
Returns:
config, space.
"""
config = partial_config.copy()
normalized = normalize(config, space, partial_config, {})
# print("normalized", normalized)
if disturb:
for key, value in normalized.items():
domain = space.get(key)
if getattr(domain, "ordered", True) is False:
# don't change unordered cat choice
continue
if not callable(getattr(domain, "get_sampler", None)):
continue
if upper and lower:
up, low = upper[key], lower[key]
if isinstance(up, list):
gauss_std = (up[-1] - low[-1]) or flow2.STEPSIZE
up[-1] += flow2.STEPSIZE
low[-1] -= flow2.STEPSIZE
else:
gauss_std = (up - low) or flow2.STEPSIZE
# allowed bound
up += flow2.STEPSIZE
low -= flow2.STEPSIZE
elif domain.bounded:
up, low, gauss_std = 1, 0, 1.0
else:
up, low, gauss_std = np.Inf, -np.Inf, 1.0
if domain.bounded:
if isinstance(up, list):
up[-1] = min(up[-1], 1)
low[-1] = max(low[-1], 0)
else:
up = min(up, 1)
low = max(low, 0)
delta = flow2.rand_vector_gaussian(1, gauss_std)[0]
if isinstance(value, list):
# points + normalized index
value[-1] = max(low[-1], min(up[-1], value[-1] + delta))
else:
normalized[key] = max(low, min(up, value + delta))
config = denormalize(normalized, space, config, normalized, flow2._random)
# print("denormalized", config)
for key, value in space.items():
if key not in config:
config[key] = value
for _, generated in generate_variants_compatible(
{"config": config}, random_state=flow2.rs_random
):
config = generated["config"]
break
subspace = {}
for key, domain in space.items():
value = config[key]
if isinstance(value, dict):
if isinstance(domain, sample.Categorical):
# nested space
index = indexof(domain, value)
# point = partial_config.get(key)
# if isinstance(point, list): # low cost point list
# point = point[index]
# else:
# point = {}
config[key], subspace[key] = complete_config(
value,
domain.categories[index],
flow2,
disturb,
> lower and lower[key][index],
upper and upper[key][index],
)
E KeyError: 'model_lo'
..\..\flaml\tune\space.py:543: KeyError
Thanks! Changing the last line of the for-loop above to
out[mdl] = {"domain": tune.choice(est_cfgs)}
I get another error:Can the syntax you use for estimators not deal with nested/hierarchical search spaces in the same way that
flaml.tune
does?Also below see the screenshot from the debugger of what the complete search space looks like.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ..\..\flaml\automl\automl.py:1814: in fit self._search() ..\..\flaml\automl\automl.py:2378: in _search self._search_sequential() ..\..\flaml\automl\automl.py:2201: in _search_sequential use_ray=False, ..\..\flaml\tune\tune.py:502: in run trial_to_run = _runner.step() ..\..\flaml\tune\trial_runner.py:125: in step config = self._search_alg.suggest(trial_id) ..\..\flaml\searcher\suggestion.py:213: in suggest suggestion = self.searcher.suggest(trial_id) ..\..\flaml\searcher\blendsearch.py:1057: in suggest return super().suggest(trial_id) ..\..\flaml\searcher\blendsearch.py:747: in suggest init_config, self._ls_bound_min, self._ls_bound_max ..\..\flaml\searcher\flow2.py:242: in complete_config partial_config, self.space, self, disturb, lower, upper _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ partial_config = {'monthly_fourier_degree': 2} space = {'model_hi': <flaml.tune.sample.Categorical object at 0x00000187CF3970C8>, 'model_lo': <flaml.tune.sample.Categorical object at 0x00000187CF391808>, 'monthly_fourier_degree': <flaml.tune.sample.Integer object at 0x00000187CF397148>} flow2 = <flaml.searcher.flow2.FLOW2 object at 0x00000187CF391FC8>, disturb = 0 lower = {'monthly_fourier_degree': 0.14285714285714285} upper = {'monthly_fourier_degree': 0.14285714285714285} def complete_config( partial_config: Dict, space: Dict, flow2, disturb: bool = False, lower: Optional[Dict] = None, upper: Optional[Dict] = None, ) -> Tuple[Dict, Dict]: """Complete partial config in space. Returns: config, space. """ config = partial_config.copy() normalized = normalize(config, space, partial_config, {}) # print("normalized", normalized) if disturb: for key, value in normalized.items(): domain = space.get(key) if getattr(domain, "ordered", True) is False: # don't change unordered cat choice continue if not callable(getattr(domain, "get_sampler", None)): continue if upper and lower: up, low = upper[key], lower[key] if isinstance(up, list): gauss_std = (up[-1] - low[-1]) or flow2.STEPSIZE up[-1] += flow2.STEPSIZE low[-1] -= flow2.STEPSIZE else: gauss_std = (up - low) or flow2.STEPSIZE # allowed bound up += flow2.STEPSIZE low -= flow2.STEPSIZE elif domain.bounded: up, low, gauss_std = 1, 0, 1.0 else: up, low, gauss_std = np.Inf, -np.Inf, 1.0 if domain.bounded: if isinstance(up, list): up[-1] = min(up[-1], 1) low[-1] = max(low[-1], 0) else: up = min(up, 1) low = max(low, 0) delta = flow2.rand_vector_gaussian(1, gauss_std)[0] if isinstance(value, list): # points + normalized index value[-1] = max(low[-1], min(up[-1], value[-1] + delta)) else: normalized[key] = max(low, min(up, value + delta)) config = denormalize(normalized, space, config, normalized, flow2._random) # print("denormalized", config) for key, value in space.items(): if key not in config: config[key] = value for _, generated in generate_variants_compatible( {"config": config}, random_state=flow2.rs_random ): config = generated["config"] break subspace = {} for key, domain in space.items(): value = config[key] if isinstance(value, dict): if isinstance(domain, sample.Categorical): # nested space index = indexof(domain, value) # point = partial_config.get(key) # if isinstance(point, list): # low cost point list # point = point[index] # else: # point = {} config[key], subspace[key] = complete_config( value, domain.categories[index], flow2, disturb, > lower and lower[key][index], upper and upper[key][index], ) E KeyError: 'model_lo' ..\..\flaml\tune\space.py:543: KeyError
Right. Here we convert the dict returned by search_space
function to the search space dict format required by flaml.tune
.
https://github.com/microsoft/FLAML/blob/87d9b35d634f8085ea87b588a85a07bf7a3b7197/flaml/automl.py#L174
but we only did it for the top level and didn't traverse the hierarchy. It didn't handle the case of hierarchical search space.
The solution would be making it recursive: if the domain is a choice of multiple child search spaces, go through each of them recursively.
I am trying to add a new time series model to the list of FLAML's built-in ones, but have trouble specifying the search space. This model contains two component models from FLAML's builtins, and I want to search over the available component models and their respective search spaces, so the code is
However when I try to use that, I get the following error:
In auto-causality, the above way of defining the nested search space works fine - am I doing something wrong or is the search space definition spec different for FLAML's built-in models, and if so, why?