Open exalate-issue-sync[bot] opened 1 year ago
Erin LeDell commented: [~accountid:557058:59501c65-23f2-4a22-af15-313526e2c87e] Sorry, somehow I missed this ticket when it was created! There’s a {{project_name}} parameter for AutoML, which will allow you to execute two different AutoML runs on the same dataset on the same H2O cluster. Here, if they are run on the same machine, they will still compete for resources, however.
You can also run two H2O instances on different ports on the same machine. e.g. {{h2o.init(port = 54321, nthreads = 32)}} and {{h2o.init(port = 55555, nthreads = 32)}}? I think this will use different sets of cores for each H2O instance, but I don’t think it’s guaranteed (the OS will try to balance this). The drawback here is that they can’t share data, so the training set will be duplicated. If you have enough RAM, this is probably better though.
JIRA Issue Migration Info
Jira Issue: PUBDEV-7257 Assignee: UNASSIGNED Reporter: Michael Jules State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
As a machine learning engineer, I need to be able to run multiple H2o AutoML processes simultaneously. I get the below error when attempting to run multiple instances on the same H2o server -- but this should be possible. I'm wondering if this is a namespace issue where the AutoML dataframe is being interrupted and I need to uniquely name one of the Auto_ml variables below. Thank you. Here is my code and the result:
1) CODE: {code:python} // From utils.py file
H2O Functions
def get_best_h2o_automl_model(train, test, valid, feature_col, y, excluded_algs): auto_ml = H2OAutoML(exclude_algos=['XGBoost']+excluded_algs, seed=1, max_runtime_secs=0) auto_ml.train(x=feature_col, y=y, training_frame=train, leaderboard_frame=test, validation_frame=valid)
// # Run H2O automl to get the best model
{code}
2) RESULT: {code:java} Starting at 2020-01-30T13:45:04.326775-08:00 Initializing H2O... Warning: if you don't want to start local H2O server, then use of
h2o.connect()
is preferred. Checking whether there is an H2O instance running at http://pg-pt-wn01-010.gld.XX.net:54321 . connected.H2O cluster uptime: 13 days 21 hours 46 mins H2O cluster timezone: America/Los_Angeles H2O data parsing timezone: UTC H2O cluster version: 3.26.0.10 H2O cluster version age: 2 months and 23 days H2O cluster name: root H2O cluster total nodes: 2 H2O cluster free memory: 155.4 Gb H2O cluster total cores: 64 H2O cluster allowed cores: 64 H2O cluster status: locked, healthy H2O connection url: http://pg-pt-wn01-010.gld.XX.net:54321 H2O connection proxy: H2O internal security: False H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4 Python version: 2.7.5 final
Parse progress: [#########################################################] 100% Using working directory: /tmp/tmpBwkLsi AutoML progress: [#######################################ERROR - Process failed due to: Unexpected HTTP error: HTTPConnectionPool(host='pg-pt-wn01-010.gld.XX.net', port=54321): Max retries exceeded with url: /3/Jobs/$0301646e18b032d4ffffffff$_9f388873e9f331e6cdc1d77bf9544a48 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f92f57d18d0>: Failed to establish a new connection: [Errno -2] Name or service not known',)) Traceback (most recent call last): File "/usr/pic1/repos/ml-models-all/bdaMlScripts/h2o_job_pred_models.py", line 189, in
main()
File "/usr/pic1/repos/ml-models-all/bdaMlScripts/h2o_job_pred_models.py", line 174, in main
h2o.remove(auto_ml)
UnboundLocalError: local variable 'auto_ml' referenced before assignment
{code}