Closed bondgy closed 11 months ago
I see this is actually a limitation of Databricks now. The AWS node type limits the amount of RAM a cluster can have for its executors, and I don't see a way to get around this except by purchasing types that allow more, which isn't super ideal.
Providing us with the observed and expected behavior definitely helps. Giving us with the following information definitively helps:
YARN-client
,YARN-cluster
, standalone, local: Databricks starting internal sparkling water clustersyarn logs -applicationId <application ID>
where the application ID is displayed when Sparkling Water is started: No explicit warnings/errorsPlease also provide us with the full and minimal reproducible code. All of these outputs below refer to a just-started Databricks cluster with no other notebooks/jobs running. When calling
H2OContext.getOrCreate()
The total cluster memory is shown in the output of the cell box as ~60GB of memory for both PySparkling and RSparkling clusters. The memory is accessible when using a normal H2O cluster. When running
h2o.init(max_mem_size="200g")
a cluster with ~200GB of memory is created. Since Databricks doesn't seem to explicitly set the spark.driver.memory and spark.executor.memory properties, I tried setting these to varying levels, but it did not change the cluster size. Since the cluster size is too small for my modeling, when running more complex tasks (e.g., GridSearch or XGBoost with a large amount of trees or depth) the cluster itself becomes unreachable. Is there a hidden property that must be set explicitly to fully utilize Databrick's RAM or is there a bug?