locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethodBase #546

Closed jornfranke closed 3 years ago

jornfranke commented 3 years ago

Pyrasterframe 0.9.0 Spark: 2.4.0

Source to reproduce

from pyrasterframes.utils import create_rf_spark_session
spark = create_rf_spark_session()
df = spark.read.raster("hdfs://..../U2018_CLC2018_V2020_20u1.tif")

Error

Py4JJavaError: An error occurred while calling o113.load.
: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethodBase
    at org.locationtech.rasterframes.experimental.datasource.awspds.L8CatalogDataSource$.<init>(L8CatalogDataSource.scala:50)
    at org.locationtech.rasterframes.experimental.datasource.awspds.L8CatalogDataSource$.<clinit>(L8CatalogDataSource.scala)
    at org.locationtech.rasterframes.experimental.datasource.awspds.L8CatalogDataSource.shortName(L8CatalogDataSource.scala:38)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:624)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:624)
    at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
    at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
    at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:624)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpMethodBase
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 27 more

It is just the local master and it seems the class is rather outdated. I am also not sure why it uses awspds, it is a geotiff on HDFS.

jornfranke commented 3 years ago

Found it. Some experimental code that is also in the standard library uses an extremely outdated httpclient: https://github.com/locationtech/rasterframes/blob/develop/experimental/src/main/scala/org/locationtech/rasterframes/experimental/datasource/DownloadSupport.scala

You find the jar in maven central here: https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient

This has been replaced more than 10 years ago by the following: https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient

I recommend to change this in the source code and include the new httpclient in the buildfiles. The old one is also to likely fail soon in any case as it supports only outdated http protocols.

As a workaround I include the old httpclient (shaded, but I guess it does not matter) in the fatjar/uberjar

metasim commented 3 years ago

@jornfranke Thanks for the tip on httpclient being outdated. That experimental code is a hack anyway, and shouldn't be included in the official distribution.... I'd take rasterframes-experimental out of your classpath, or use the assembly generated as a part of the pyrasterframes distribution.

jornfranke commented 2 years ago

It is bit weird, I can see your comments in my email inbox, but not on Github. Removing experimental from the classpath does not help as originally explained. I created my own fat jar of rasterframes and shaded the old HTTP client. I can share a build.sbt for this tomorrow.

On Thu, Aug 12, 2021 at 7:43 PM Rafael Bergamin @.***> wrote:

Some other weird stuff. Running spark.read.raster(uri) again gave another output, as follows:

Py4JJavaError: An error occurred while calling o142.load. : java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.rasterframes.experimental.datasource.awspds.L8CatalogDataSource$ at org.locationtech.rasterframes.experimental.datasource.awspds.L8CatalogDataSource.shortName(L8CatalogDataSource.scala:38) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:630) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:630) at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247) at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259) at scala.collection.AbstractTraversable.filter(Traversable.scala:104) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:745)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/locationtech/rasterframes/issues/546#issuecomment-897835172, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMVDZDGRUNKJDFH3J7TLO3T4QB4RANCNFSM4YBD4DHQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

abhishekkrbaliase commented 2 years ago

I just passes the http client Jar to spark using --jar and it stopped giving me these errors

Thus, deleted the comment

On Fri, 13 Aug, 2021, 1:56 am Jörn Franke, @.***> wrote:

It is bit weird, I can see your comments in my email inbox, but not on Github. Removing experimental from the classpath does not help as originally explained. I created my own fat jar of rasterframes and sharded the old HTTP client. I can share a build.sbt for this tomorrow.

On Thu, Aug 12, 2021 at 7:43 PM Rafael Bergamin @.***> wrote:

Some other weird stuff. Running spark.read.raster(uri) again gave another output, as follows:

Py4JJavaError: An error occurred while calling o142.load. : java.lang.NoClassDefFoundError: Could not initialize class org.locationtech.rasterframes.experimental.datasource.awspds.L8CatalogDataSource$ at org.locationtech.rasterframes.experimental.datasource.awspds.L8CatalogDataSource.shortName(L8CatalogDataSource.scala:38) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:630) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$18.apply(DataSource.scala:630) at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247) at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259) at scala.collection.AbstractTraversable.filter(Traversable.scala:104) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:745)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/locationtech/rasterframes/issues/546#issuecomment-897835172 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAMVDZDGRUNKJDFH3J7TLO3T4QB4RANCNFSM4YBD4DHQ

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/locationtech/rasterframes/issues/546#issuecomment-897945231, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNDRE77DIZLDT2PCM4DRTLT4QVBDANCNFSM4YBD4DHQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .