samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

runDMLQuery on AWS Glue throws an exception because of Jersey libraries conflict #67

Closed cheptsov closed 5 years ago

cheptsov commented 5 years ago

When using the runDMLQuery method on AWS Glue, it throws an exception:

18/10/30 23:30:49 WARN ServletHandler: Error for /api/v1/applications/application_1540940741469_0002
java.lang.NoSuchMethodError: javax.ws.rs.core.Application.getProperties()Ljava/util/Map;
at org.glassfish.jersey.server.ApplicationHandler.<init>(ApplicationHandler.java:331)
at org.glassfish.jersey.servlet.WebComponent.<init>(WebComponent.java:392)
at org.glassfish.jersey.servlet.ServletContainer.init(ServletContainer.java:177)
at org.glassfish.jersey.servlet.ServletContainer.init(ServletContainer.java:369)
at javax.servlet.GenericServlet.init(GenericServlet.java:158)
at org.spark_project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:640)
at org.spark_project.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:496)
at org.spark_project.jetty.servlet.ServletHolder.ensureInstance(ServletHolder.java:788)
at org.spark_project.jetty.servlet.ServletHolder.prepare(ServletHolder.java:773)
at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:578)
at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:461)
at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.spark_project.jetty.server.Server.handle(Server.java:524)
at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)
at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)

In addition to the exception the job freezes for quite a while. I believe this is caused by the conflict of Jersey libraries in the classpath. Any ideas how this could be either solved or workaround-ed? Thanks in advance.

samelamin commented 5 years ago

Probably best to create an uber jar to include all the dependencies

http://queirozf.com/entries/creating-scala-fat-jars-for-spark-on-sbt-with-sbt-assembly-plugin