h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
967 stars 359 forks source link

AutoML is not working properly with large volume of data(30 Million rows with 150 features) #5747

Open aniketaitawade opened 3 weeks ago

aniketaitawade commented 3 weeks ago

Sparkling Water Version

3.5

Issue description

Expected behavior: Sparkling water can train individual models like XGBoost then it should also run for automl api. Observed behavior: Sparkling water can train individual models like XGBoost but fail to run with automl api.

Programming language used

Python

Programming language version

3.11

What environment are you running Sparkling Water on?

Cloud Managed Spark (like Databricks, AWS Glue)

Environment version info

15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

Brief cluster specification

Runtime 15.4.x-scala2.12, 1 Driver with 64 GB Memory, 8 Cores, 7 Workers with 64 GB Memory 8 Cores

Relevant log output

Dont have any error logs as process continues for long time.

Code to reproduce the issue

No response