locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
244 stars 45 forks source link

spark-submit python script with pyrasterframes failed #591

Open ngulyaev opened 1 year ago

ngulyaev commented 1 year ago

spark-submit failed with following error:

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.locationtech.rasterframes.py.PyRFContext.
: java.lang.NoSuchMethodError: 'shapeless.DefaultSymbolicLabelling shapeless.DefaultSymbolicLabelling$.instance(shapeless.HList)'
    at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder(StandardEncoders.scala:68)
    at org.locationtech.rasterframes.encoders.StandardEncoders.spatialKeyEncoder$(StandardEncoders.scala:68)
    at org.locationtech.rasterframes.package$.spatialKeyEncoder$lzycompute(package.scala:39)
    at org.locationtech.rasterframes.package$.spatialKeyEncoder(package.scala:39)
    at org.locationtech.rasterframes.StandardColumns.$init$(StandardColumns.scala:42)
    at org.locationtech.rasterframes.package$.<init>(package.scala:39)
    at org.locationtech.rasterframes.package$.<clinit>(package.scala)
    at org.locationtech.rasterframes.py.PyRFContext.<init>(PyRFContext.scala:49)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)

command:

spark-submit --master local[*] --deploy-mode client --packages org.locationtech.rasterframes:rasterframes_2.12:0.10.1,org.locationtech.rasterframes:pyrasterframes_2.12:0.10.1,org.locationtech.rasterframes:rasterframes-datasource_2.12:0.10.1 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryo.registrator=org.locationtech.rasterframes.util.RFKryoRegistrator --conf spark.kryoserializer.buffer.max=500m --py-files pyrasterframes_2.12-0.10.1-python.zip sat_indices.py

code:

spark = SparkSession.builder \
        .appName("test") \
        .withKryoSerialization() \
        .getOrCreate() \
        .withRasterFrames()

The code fails on the "withRasterFrames()" line.

I couldn't find an example in documentation how to use pyrasterframes through the spark-submit script. I have found only the pyspark shell example but it didn't work unfortunately. Spark version: 3.1.2, pyrasterframes: 0.10.1

pomadchin commented 1 year ago

hey @ngulyaev I think loading this dep directly as a package won't really work due to the shapeless libraries mismatch 🤔 should be a shaded assembly in the classpath.

pomadchin commented 1 year ago

Also shapeless 2.3.7 can be compatible: so another option is to update us frist to 3.2.x and then to 3.3.x