microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.92k stars 510 forks source link

Expose BlendSearch `cost_attr` parameter through `flaml.tune.run` API #1166

Open bbudescu opened 1 year ago

bbudescu commented 1 year ago

The initializer of the BlendSearch class takes a cost_attr parameter that allows the user to specify which one of the reported metrics to consider to be the cost when running optimization.

When calling tune.run, the cost_attr parameter is always assigned the default value of auto, which falls back to the reported "time_total_s".

I know, one obvious solution would be to instantiate the BlendSearch class with the desired custom value for the cost_attr param before calling into tune.run, and just pass the instance as the search_alg param. However, that's not as convenient, especially since there is a tiny bit of preprocessing happening within tune.run on the params before being passed to BlendSearch.__init__ (like assigning some default values, for example). This preprocessing would need to be copied in the tune.run caller scope, which might lead to getting out of sync with the upstream code.

As a sidenote, here's the reason for which I want to use a custom cost_attr. My cost is the time required to evaluate the configuration, so time_total_s should have been precisely what I needed. However, it turns out time_total_s also includes the time required to call tune.report. Now, interestingly enough, after running a lot of trials (like 20k-30k or so), this duration gradually increases up to the point where it overwhelmingly dominates the occupancy of the ray processes running the evaluation. The ray worker process ends up spending something like 10 times more time waiting for results to get reported than actually running the evaluation. I haven't measured an pinpointed this exactly, but I suspect this is happening because, as the number of trial results increases, it takes more and more time to fit the TPE model on the ever larger amount data. ray.report blocks until fitting previous results is finished.

One way to address this would be to update the model less frequently than upon every single trial result being received, so maybe only update it once for all the results that were accumulated in a queue or something since the last model update. Even better, the frequency of TPE model fitting can be further decreased until the worker waiting time reaches something reasonable like, say, under 10% of the total time.

Perhaps there are other ways to address this, or perhaps this is inherent to the TPE algorithm itself. I guess a good first step would be to profile the guiding process running on the head node, because I'm not even sure this is where the bottleneck occurs. It might even be somewhere else altogether, e.g., within the local search code (CFO). If it indeed is the TPE that's slowing things down, maybe another way to go would be to make the TPE fitting process use multiple threads, e.g., the ones waiting for a new suggestion to be made, or maybe a GPU or something.

bbudescu commented 1 year ago

The problem here is that I imagine that the optimizer might get confused by the reporting time being included in the cost of evaluation, and starts making bad decisions because it's trying to get away with less costly evaluations, which aren't even going to be less costly anyway (e.g., the same trial will show different costs depending on whether it's run as the first or, say, 20,000th trial).

sonichi commented 1 year ago

Exposing cost_attr should be OK. Please feel free to make a PR.