h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
954 stars 362 forks source link

describe an h2oframe #5710

Open joscani opened 4 months ago

joscani commented 4 months ago

Sparkling Water Version

3.44.0.3-1-2.4

Issue description

Hi. Maybe this is a simple question. How can I get summary of h2oframe in sparklingwater? I try

my_h2o_frame.describe()

But I get 49: error: value describe is not a member of ai.h2o.sparkling.H2OFrame

Programming language used

Scala

Programming language version

Scala 2.11.12

What environment are you running Sparkling Water on?

Hadoop (YARN)

Environment version info

spark 2.4

Brief cluster specification

280 vcores, ram 2tb

Relevant log output

scala> var_uni_hc.describe()
<console>:49: error: value describe is not a member of ai.h2o.sparkling.H2OFrame
       var_uni_hc.describe()

Code to reproduce the issue

val var_uni = spark.table("my_schema.var_uni").
  filter($"year" === 2024 ).
  filter($"month" === 2).
  filter($"day" === 4)

val var_uni_hc = hc.asH2OFrame(var_uni) 

var_uni_hc.describe()
krasinski commented 2 months ago

hey @joscani, what info would you like to get? in python you could call frame.frameId and then use h2o python client method get_frame(), and then describe it

joscani commented 2 months ago

I would like get mean, variance, sd, cardinality, etc. number of missing. But using scala, not python or R. In R or python the methods works