swoop-inc / spark-alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
https://swoop-inc.github.io/spark-alchemy/
Apache License 2.0
187 stars 34 forks source link

Error while registering HLL Functions: "object swoop is not a member of package com" #20

Closed vis-8757 closed 3 years ago

vis-8757 commented 3 years ago

Hello Developers ! I need to use HLL Functions in my SQL codes. I am using Zeppelin/Jupyter Notebooks and running my sql codes in a PySpark setting. To register HLL Functions in spark environment, I had been using the below command:

com.swoop.alchemy.spark.expressions.hll.HLLFunctionRegistration.registerFunctions(spark)

Everything was working good until April 2021 but starting May, this command now gives the following error:

:24: error: object swoop is not a member of package com com.swoop.alchemy.spark.expressions.hll.HLLFunctionRegistration.registerFunctions(spark)

Please help me with how to register HLL functions if this command is giving this error. This is the standard command to register HLL functions in spark and that's what I see on the web. No idea why it suddenly started giving an error now.

spark error

MrPowers commented 3 years ago

@vis-8757 - thanks for reporting this. Can you please let us know which versions of PySpark & spark-alchemy you're using? Some old versions of spark-alchemy won't work with Spark 3.

Your error message is what I'd expect if spark-alchemy wasn't attached to the cluster, so can you also double check to make sure the library was attached when you ran that code? Thanks!

vis-8757 commented 3 years ago

Hello @MrPowers - thank you for the quick reply ! Also, Sorry for the delay in my response. I'm was using EMR 5.3 & 5.24. The spark version is 2.4.3. I'm not sure how to check the version of spark alchemy but I believe its most probably spark-alchemy 2.12-1.0.1

My cluster admins informed that they have never manually installed any packages, on the cluster. So all the packages must be getting installed from the AWS side only by default. But everything was working good until 30th April.

We thought something might be wrong from the AWS side so we also contacted AWS Tech support. Please find below their response:

-------------------------------------------------------------------------------------------------------------------------------

I launched 2 EMR clusters of 5.30 and 5.24 with zeppelin and spark enabled. Then I ran the command on zeppelin and spark-shell on both the clusters - com.swoop.alchemy.spark.expressions.hll.HLLFunctionRegistration.registerFunctions(spark)

However, I received the error - error: object swoop is not a member of package com

To fix the error, I downloaded the jar from swoop spark-alchemy repo [1] onto EMR cluster, and copied it to the spark jars and zeppelin jars directory.

wget https://repo1.maven.org/maven2/com/swoop/spark-alchemy_2.12/1.0.1/spark-alchemy_2.12-1.0.1.jar

To copy inside spark jars directory

sudo cp spark-alchemy_2.12-1.0.1.jar /usr/lib/spark/jars/

To copy inside zeppelin jars directory

sudo cp spark-alchemy_2.12-1.0.1.jar /usr/lib/zeppelin/lib/

After this, I ran the command again, I was able to bypass the above error. However, I received another error -

========================================================================
java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
  at com.swoop.alchemy.spark.expressions.NativeFunctionRegistration.expression(NativeFunctionRegistration.scala:42)
  at com.swoop.alchemy.spark.expressions.NativeFunctionRegistration.expression$(NativeFunctionRegistration.scala:29)
  at com.swoop.alchemy.spark.expressions.hll.HLLFunctionRegistration$.expression(HLLFunctionRegistration.scala:6)
  at com.swoop.alchemy.spark.expressions.hll.HLLFunctionRegistration$.<init>(HLLFunctionRegistration.scala:9)
  at com.swoop.alchemy.spark.expressions.hll.HLLFunctionRegistration$.<clinit>(HLLFunctionRegistration.scala)
  ... 47 elided
========================================================================

On viewing the spark-alchemy_2.12-1.0.1.jar, I could see that it has "NativeFunctionRegistration.class"

On looking at the class, it looks like method 'scala.Predef$.refArrayOps' is missing, or there could be some dependency that might be required with swoop alchemy.

------------------------------------------------------------------------------------------------------------------------

I hope this gives you a better picture of the problem at hand. Would be really grateful for any and all help ! Thanks !

MrPowers commented 3 years ago

@vis-8757 - Spark 2.4 requires Scala 2.11. Looks like you're trying to use the spark-alchemy JAR that was compiled with Scala 2.12, which is what should be used with Spark 3 applications.

The Bintray shutdown is probably why stuff stopped working for you. We had to migrate the JAR files to Maven to account for the Bintray shutdown.

Sorry this has been breaking 😑 A lot of factors were outside our control. Let me know if there is anything else I can do to help.

vis-8757 commented 3 years ago

@MrPowers Thank you for your useful input !

We are trying to either install spark-alchemy version 2.11 or upgrade to Spark version 3; and try running our code then. We will let you know about the progress. I'm letting this issue be open for now.

Thanks !

vis-8757 commented 3 years ago

Hi @MrPowers Update on the issue :

I just found that we had an automated setup to install spark-alchemy every time the EMR cluster was switched on. And the link/address we were using to install spark-alchemy was :

https://dl.bintray.com/swoop-inc/maven/com/swoop/spark-alchemy_2.11/0.3.28/spark-alchemy_2.11-0.3.28.jar

So, as you see, we were using spark-alchemy 2.11-0.3.28 on Spark 2.4.2 and on Spark 2.4.3. Sorry for the wrong info in my earlier comment when I said : "we are most probably using spark-alchemy 2.12-1.0.1"

Now the problem is, when we click on the above bintray link, it gives a "Forbidden" message. Suggesting that something has changed (probably some directory or jar location) . So could you please help us with the correct link for spark-alchemy 2.11-0.3.28 ??

I'm wondering if that's what was lying at the heart of the issue.

Thank you !

MrPowers commented 3 years ago

@vis-8757 - You're getting this error because JFrog shut down Bintray.

I'll check internally and see if we have the bandwidth to re-release a Scala 2.11 JAR file directly to Maven. We've moved on to Spark 3 / Scala 2.12 and I'm not sure we'll have the capacity for legacy support. You can try to build a JAR file yourself - that might be your best option.