tune: support for incremental searches

bbudescu commented 1 year ago

So, let's assume you get a new dataset every month and need to train a new model on that data as soon as you receive it. You don't care if the model doesn't perform very well on old data, you care about the newest data only. There's enough of a concept drift within that data, that the optimal model hyperparameter configuration differs a bit from month to month so it helps to do a little tuning every time you get data. Accumulated over several months, however, the pairwise performance differences among a set of configurations might end up being significant. Of course, there are certain stable regions in the design space which have always yielded, e.g., terrible results and should mostly be avoided in the future, too.

Now, how might one implement such an iterative scenario using FLAML tune?

Option 1: searching the best config within a constrained sub-space of the original design space

I'm thinking that the dataset can be represented by just another integer parameter in the design space, i.e., the index of the current month [0, 1, 2, ..., 12]. However, this parameter shouldn't be sampled by the algorithm, but its value should be specified manually each month. Also unlike regular parameters, we're not looking for a global optimum any more, but for the local optimum, specific only for that particular month.

As such, we'd need a mechanism to have the optimizer sample a fixed value during a single optimization session. I think Optuna has something for this called PartialFixedSampler (not exactly sure, though, if this is exactly right).

Alternatively, when launching the search for the current month's optimum, in the design space, we could change the bounds of the month index variable (e.g., qrandint(11, 11, 1) if we want to select month 11). However, results from previous months, e.g., from month 10, will, basically, be considered invalid. Will it still be possible to warm start this month's optimization using previous months' trials through, tune.run's points_to_evaluate and evalulated_rewards arguments, or will the optimizer complain that it received an invalid configuration because 10 is not in [11, 11]? Also, will this incremental approach change the surface learned by the BO, i.e., will it have to unlearn/relearn some changing regions of the design space, and thus lose performance?

Option 2: instances / transfer learning

I can see other optimizer packages address this issue of multiple datasets with dedicated mechanism. E.g., SMAC3 has instances, while OpenBox calls this "transfer learning". SMAC3 can also assign some values to so-called "instance features", and I believe these can better inject the local smoothness prior along the temporal dimension, and that might have an advantage over just treating all months as independent data sets. Maybe you guys also have something similar, but I missed it or wasn't documented or something.

sonichi commented 1 year ago

@qingyun-wu @skzhang1 @jtongxin

sonichi commented 1 year ago

Currently, the most related feature flaml offers is meta learning: https://microsoft.github.io/FLAML/docs/Use-Cases/Zero-Shot-AutoML#how-to-prepare-offline. It can learn robust configurations based on existing results for combinations of configurations x tasks. You can then use them as starting points for new tasks.

bbudescu commented 1 year ago

Ok, is there a way to do this without going through the AutoML api? How about just changing the search space every time a new optimization session is run, like I suggested in the last paragraph under Option 1 above?

I.e.

Alternatively, when launching the search for the current month's optimum, in the design space, we could change the bounds of the month index variable (e.g., qrandint(11, 11, 1) if we want to select month 11). However, results from previous months, e.g., from month 10, will, basically, be considered invalid. Will it still be possible to warm start this month's optimization using previous months' trials through, tune.run's points_to_evaluate and evalulated_rewards arguments, or will the optimizer complain that it received an invalid configuration because 10 is not in [11, 11]? Also, will this incremental approach change the surface learned by the BO, i.e., will it have to unlearn/relearn some changing regions of the design space, and thus lose performance?

sonichi commented 1 year ago

Ok, is there a way to do this without going through the AutoML api? How about just changing the search space every time a new optimization session is run, like I suggested in the last paragraph under Option 1 above?

I.e.

Alternatively, when launching the search for the current month's optimum, in the design space, we could change the bounds of the month index variable (e.g., qrandint(11, 11, 1) if we want to select month 11). However, results from previous months, e.g., from month 10, will, basically, be considered invalid. Will it still be possible to warm start this month's optimization using previous months' trials through, tune.run's points_to_evaluate and evalulated_rewards arguments, or will the optimizer complain that it received an invalid configuration because 10 is not in [11, 11]? Also, will this incremental approach change the surface learned by the BO, i.e., will it have to unlearn/relearn some changing regions of the design space, and thus lose performance?

Use suggest_config to get k configs and pass them via points_to_evaluate. In you suggested Option 1, the passed evaluated_rewards could mislead the optimizer because they will be treated as known rewards for the corresponding points_to_evaluate. But if your search space is continuous, this could be OK as a point nearby still has a chance to be sampled.

bbudescu commented 1 year ago

I kinda need to tune a user defined function. I'm not sure how to fit it into this paradigm of standard ML tasks.

sonichi commented 1 year ago

I kinda need to tune a user defined function. I'm not sure how to fit it into this paradigm of standard ML tasks.

Do you have meta features for your tasks? Meaning, for each tuning task, are there meta features to characterize the similarity between tasks?

bbudescu commented 1 year ago

I was thinking to use such a meta feature to store the index of the month in which the data was collected, i.e., the month index in [0...12] in the example above as an alternative to treating the month index as a regular parameter and constraining its search space to the current month index every time we rune a tuning session. Of course, some stats can be computed on each dataset and appended as further meta features, and that might help drive the optimization to be more efficient, but that's not the case currently.

sonichi commented 1 year ago

I was thinking to use such a meta feature to store the index of the month in which the data was collected, i.e., the month index in [0...12] in the example above as an alternative to treating the month index as a regular parameter and constraining its search space to the current month index every time we rune a tuning session. Of course, some stats can be computed on each dataset and appended as further meta features, and that might help drive the optimization to be more efficient, but that's not the case currently.

We can extend the meta learning from automl to tune in future version. For now, you can experiment with Option 1 in your post, as long as the search space is continuous. @qingyun-wu any other suggestions?

bbudescu commented 1 year ago

Well, my datasets are month-wise, so the most natural representation is an int that indexes the consecutive months (actually, a qrandint to be able to have the start of the per-session search range equal to its stop point, so that the subspace is, effectively, collapsed into a single value on that dimension).

Now, I could just manually round a uniform to the nearest int, but that would make the optimizer suggest the same month without knowing that it's the same month, and it will do that until it learns from examples that this is a stepwise function, which might be a pain to model by smooth functions like gaussian processes. That, I assume, might lead to some drop in performance, which might be alleviated by some form of caching.

Maybe I could use a quniform instead? What are its performance implications? Does it do exactly what I wrote above, or does it handle things more gracefully?

Anyway, it's not very clear to me as to why it's ok to change the search space only if the parameter is a continuous. Why not use a qrandint?

sonichi commented 1 year ago

Well, my datasets are month-wise, so the most natural representation is an int that indexes the consecutive months (actually, a qrandint to be able to have the start of the per-session search range equal to its stop point, so that the subspace is, effectively, collapsed into a single value on that dimension).

Now, I could just manually round a uniform to the nearest int, but that would make the optimizer suggest the same month without knowing that it's the same month, and it will do that until it learns from examples that this is a stepwise function, which might be a pain to model by smooth functions like gaussian processes. That, I assume, might lead to some drop in performance, which might be alleviated by some form of caching.

Maybe I could use a quniform instead? What are its performance implications? Does it do exactly what I wrote above, or does it handle things more gracefully?

Anyway, it's not very clear to me as to why it's ok to change the search space only if the parameter is a continuous. Why not use a qrandint?

Sorry, I meant whether you have other continuous hyperparameters in the search space.

qingyun-wu commented 1 year ago

So, let's assume you get a new dataset every month and need to train a new model on that data as soon as you receive it. You don't care if the model doesn't perform very well on old data, you care about the newest data only. There's enough of a concept drift within that data, that the optimal model hyperparameter configuration differs a bit from month to month so it helps to do a little tuning every time you get data. Accumulated over several months, however, the pairwise performance differences among a set of configurations might end up being significant. Of course, there are certain stable regions in the design space which have always yielded, e.g., terrible results and should mostly be avoided in the future, too.

Now, how might one implement such an iterative scenario using FLAML tune?

Option 1: searching the best config within a constrained sub-space of the original design space

I'm thinking that the dataset can be represented by just another integer parameter in the design space, i.e., the index of the current month [0, 1, 2, ..., 12]. However, this parameter shouldn't be sampled by the algorithm, but its value should be specified manually each month. Also unlike regular parameters, we're not looking for a global optimum any more, but for the local optimum, specific only for that particular month.

As such, we'd need a mechanism to have the optimizer sample a fixed value during a single optimization session. I think Optuna has something for this called PartialFixedSampler (not exactly sure, though, if this is exactly right).

Alternatively, when launching the search for the current month's optimum, in the design space, we could change the bounds of the month index variable (e.g., qrandint(11, 11, 1) if we want to select month 11). However, results from previous months, e.g., from month 10, will, basically, be considered invalid. Will it still be possible to warm start this month's optimization using previous months' trials through, tune.run's points_to_evaluate and evalulated_rewards arguments, or will the optimizer complain that it received an invalid configuration because 10 is not in [11, 11]? Also, will this incremental approach change the surface learned by the BO, i.e., will it have to unlearn/relearn some changing regions of the design space, and thus lose performance?

Option 2: instances / transfer learning

I can see other optimizer packages address this issue of multiple datasets with dedicated mechanism. E.g., SMAC3 has instances, while OpenBox calls this "transfer learning". SMAC3 can also assign some values to so-called "instance features", and I believe these can better inject the local smoothness prior along the temporal dimension, and that might have an advantage over just treating all months as independent data sets. Maybe you guys also have something similar, but I missed it or wasn't documented or something.

Hi @bbudescu, in addition to the existing zero-shot feature, we have some in-process ideas related to Option 1 and would like to learn more about the needs in your case. Could we chat on the community discord https://discord.gg/Cppx2vSPVP. ?

microsoft / FLAML