Open exalate-issue-sync[bot] opened 1 year ago
Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c]
{quote}Set an environment variable. This [reportedly|https://stackoverflow.com/questions/65897690/how-to-disable-gpus-in-h2o-automl] works in Python for XGBoost, but is ignored by AutoML.{quote}
{noformat}import os os.environ["CUDA_VISIBLE_DEVICES"] = ""{noformat}
I believe it works for native XGBoost (whose process inherits from current Py process), not for H2O XGBoost. This is not a reasonable solution for us given that the backend is on a completely separate process and can even be on a remote machine.
Sebastien Poirier commented: I’m not a fan of the first approach, passing those params directly to the {{H2OAutoML}} function/constructor: this means that the user would have to pass those for every run when in the great majority of cases, this should be set once and for all. Besides, they look like too specific to be exposed there.
As for {{h2o.init}} params, this is not ideal either as it means the user would have to restart the backend if s/he needs to compare the cpu version with the gpu one.
My favourite approach would be one of those:
AutoML doesn’t need to forward those params to XGB then. User can set the params using for example {{h2o.rapids('(setproperty "sys.ai.h2o.algos.default_backend_mode" "gpu")')}}.
For example: {{H2OAutoML.set_env(“backend_mode”, “gpu”)}} This has the advantage (or the inconvenient) of limiting the changes to AutoML. This could also be used for similar parameters that are not expected to be changed between AutoML runs.
[~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] , [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] ?
Michal Kurka commented: I think this problem is broader than just what is discussed in this Jira. Similarly, you might what to change how many cores can XGBoost use in AutoML, what GPU to use in case there is more than one, and so on.
Sebastien Poirier commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] I agree with you that the problematic of hardware allocation can be extended, and that’s one major reason why I’m not convinced it should be implemented as new H2OAutoML instance params.
{quote}you might what to change how many cores can XGBoost use in AutoML{quote}
there’s no way to specify this on XGB currently, correct?
{quote}what GPU to use in case there is more than one{quote}
this one is discussed here by finding a way to expose {{gpu_id}}
What I’d like to discuss here is at which level we want to expose this hardware configuration logic? (in my order of preference).
I exclude the possibility to apply different hardward logic for different XGBs in the same AutoML run.
Michal Kurka commented: {quote}there’s no way to specify this on XGB currently, correct?{quote}
you can change that on the algo level by setting nthreads
Michal Kurka commented: [~accountid:5b153fb1b0d76456f36daced] I think I have a possible solution:
Add a static function on the estimator level, eg. set_defaults accepting a dictionary of default parameters. This will be sent to the backend and it will be used to overwrite the blueprint instances of the parameters defined on the ModelBuilders (that are then cloned for each new model).
This would be a very flexible solution - it would let you override any default in H2O.
What do you think?
Sebastien Poirier commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] I like your suggestion! It creates a more natural mapping between algos and those user-set defaults. However, when I still haven’t found a way to apply those defaults consistently. Everything would be fine if the codebase was always using the blueprints to build a new builder or even just a new parameters instance, but this is not the case currently, and I’m not only talking about AutoML, it would be easy there to ensure that the blueprints are always used.
But the schemas themselves use Parameters constructors, as well as grids and probably many other use cases, without mentioning tests. Also of course, those blueprints need to be updaded on all nodes.
Or… maybe we could also introduce a hook to {{Model.Parameters}}, in addition to the blueprints updates, e.g.:
{code:java}public Parameters() { ... applyUserDefaults(); }
protected void applyUserDefaults() {}
//// e.g. in XGBoostParameters: protected void applyUserDefaults() { _backend = Backend.valueOf(getUserDefault(algoName(), "backend")); }{code}
Michal Kurka commented: [~accountid:5b153fb1b0d76456f36daced] you are right, it wouldn’t work in a code that just does new XYXParameters…
it would have to be a new feature built into a model builder infrastructure instead - which I think is reasonably doable
(I would have a lot of use for a feature like that, especially for parameters that are not exposed in API, and are only in java)
JIRA Issue Details
Jira Issue: PUBDEV-7985 Assignee: Sebastien Poirier Reporter: Erin LeDell State: Open Fix Version: Backlog Attachments: N/A Development PRs: N/A
Let's figure out the best way to allow the user access to the {{backend}} and {{gpu_id}} parameters for XGBoost models in AutoML. Some users want to turn on/off the GPU or select which GPU to use.
Options:
Add the {{backend}} and {{gpu_id}} parameters directly to the AutoML function.
Add {{h2o.init()}} or some global {{h2o.backend_defaults()}} utility function to set those kind of properties used by algos. Then however, we would need to ensure that XGB always use those properties, except if they’re overridden by the algo params.
-Set an environment variable. This- [-reportedly-|https://stackoverflow.com/questions/65897690/how-to-disable-gpus-in-h2o-automl] -works in Python for XGBoost, but is ignored by AutoML.-
{noformat}import os os.environ["CUDA_VISIBLE_DEVICES"] = ""{noformat}