h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
961 stars 361 forks source link

Sparkling Water not properly configuring RAM on Databricks #5692

Closed bondgy closed 10 months ago

bondgy commented 10 months ago

Providing us with the observed and expected behavior definitely helps. Giving us with the following information definitively helps:

Please also provide us with the full and minimal reproducible code. All of these outputs below refer to a just-started Databricks cluster with no other notebooks/jobs running. When calling

H2OContext.getOrCreate()

The total cluster memory is shown in the output of the cell box as ~60GB of memory for both PySparkling and RSparkling clusters. The memory is accessible when using a normal H2O cluster. When running

h2o.init(max_mem_size="200g")

a cluster with ~200GB of memory is created. Since Databricks doesn't seem to explicitly set the spark.driver.memory and spark.executor.memory properties, I tried setting these to varying levels, but it did not change the cluster size. Since the cluster size is too small for my modeling, when running more complex tasks (e.g., GridSearch or XGBoost with a large amount of trees or depth) the cluster itself becomes unreachable. Is there a hidden property that must be set explicitly to fully utilize Databrick's RAM or is there a bug?

bondgy commented 10 months ago

I see this is actually a limitation of Databricks now. The AWS node type limits the amount of RAM a cluster can have for its executors, and I don't see a way to get around this except by purchasing types that allow more, which isn't super ideal.