swoop-inc / spark-alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
https://swoop-inc.github.io/spark-alchemy/
Apache License 2.0
187 stars 34 forks source link

Error while using HLL Functions in Spark: net/agkn/hll/serialization/IHLLMetadata #24

Closed vis-8757 closed 3 years ago

vis-8757 commented 3 years ago

Hello Developer Community

We are using EMR 6.1 that comes by default with spark version 3.0. We then installed spark-alchemy 2.12 in it using the below command: wget -O /home/hadoop/spark-alchemy_2.12-1.0.1.jar https://repo1.maven.org/maven2/com/swoop/spark-alchemy_2.12/1.0.1/spark-alchemy_2.12-1.0.1.jar

After doing this we registered HLL functions in Zeppelin notebook without error using the below command: %spark com.swoop.alchemy.spark.expressions.hll.HLLFunctionRegistration.registerFunctions(spark)

However when we ran the below query to process our data column containing HLL sketches, we received an error : %sql select hll_cardinality(hll_merge(exposure_hll)) from table1

ERROR: Error happens in sql: select hll_cardinality(hll_merge(exposure_hll)) from table1 net/agkn/hll/serialization/IHLLMetadata; line 1 pos 23 set zeppelin.spark.sql.stacktrace = true to see full stacktrace

I am not sure what's going wrong now, would be really grateful if you could provide some guidance on this.

Thanks !

pidge commented 3 years ago

Hey, can't really help without more details. I suggest following the error's suggestion and setting zeppelin.spark.sql.stacktrace = true in your Spark config to get a stack trace.