Closed CaiCaiXian closed 1 year ago
Hey @CaiCaiXian, could you post the whole file with the code? Most likely Spark forces serialization of the application parts that should not be serialized. Due to the way code is written they are serialized and it requires some adjustments.
Hey @CaiCaiXian, could you post the whole file with the code? Most likely Spark forces serialization of the application parts that should not be serialized. Due to the way code is written they are serialized and it requires some adjustments.
i think i know the reason is because i didn‘t use maven-assembly-plugin to package the jar , so when it was submited to the remote spark it throwed that error . now i fix it. But unfortunately, it threw a new error again. Is it still a packaging issue ?
Caused by: java.lang.NoClassDefFoundError: Could not initialize class geotrellis.raster.io.geotiff.TiffType$
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2249) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2268) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2293) at org.apache.spark.rdd.RDD.count(RDD.scala:1274) at geotrellis.spark.store.GeoTiffInfoReader.readWindows(GeoTiffInfoReader.scala:87) at geotrellis.spark.store.GeoTiffInfoReader.readWindows$(GeoTiffInfoReader.scala:64) at geotrellis.spark.store.hadoop.HadoopGeoTiffInfoReader.readWindows(HadoopGeoTiffInfoReader.scala:30) at geotrellis.spark.store.hadoop.HadoopGeoTiffRDD$.apply(HadoopGeoTiffRDD.scala:129) at geotrellis.spark.store.hadoop.HadoopGeoTiffRDD$.apply(HadoopGeoTiffRDD.scala:160) at geotrellis.spark.store.hadoop.HadoopGeoTiffRDD$.multiband(HadoopGeoTiffRDD.scala:203) at geotrellis.spark.store.hadoop.HadoopGeoTiffRDD$.spatialMultiband(HadoopGeoTiffRDD.scala:252) at geotrellis.spark.store.hadoop.HadoopSparkContextMethods.hadoopMultibandGeoTiffRDD(HadoopSparkContextMethods.scala:96) at geotrellis.spark.store.hadoop.HadoopSparkContextMethods.hadoopMultibandGeoTiffRDD$(HadoopSparkContextMethods.scala:91) at geotrellis.spark.store.hadoop.Implicits$HadoopSparkContextMethodsWrapper.hadoopMultibandGeoTiffRDD(Implicits.scala:41) at geotrellis.spark.store.hadoop.HadoopSparkContextMethods.hadoopMultibandGeoTiffRDD(HadoopSparkContextMethods.scala:83) at geotrellis.spark.store.hadoop.HadoopSparkContextMethods.hadoopMultibandGeoTiffRDD$(HadoopSparkContextMethods.scala:82) at geotrellis.spark.store.hadoop.Implicits$HadoopSparkContextMethodsWrapper.hadoopMultibandGeoTiffRDD(Implicits.scala:41) at com.cjx.geospark.RasterUtils$.pyramid(RasterUtils.scala:52) at com.cjx.geospark.RasterUtils$.main(RasterUtils.scala:103) at com.cjx.geospark.RasterUtils.main(RasterUtils.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) at com.cjx.geospark.SubmitUtil$.submitClientMode(SubmitUtil.scala:60) at com.cjx.geospark.SubmitUtil$.main(SubmitUtil.scala:10) at com.cjx.geospark.SubmitUtil.main(SubmitUtil.scala) Caused by: java.lang.NoClassDefFoundError: Could not initialize class geotrellis.raster.io.geotiff.TiffType$ at geotrellis.raster.io.geotiff.reader.GeoTiffInfo$.read(GeoTiffInfo.scala:141) at geotrellis.spark.store.hadoop.HadoopGeoTiffInfoReader.getGeoTiffInfo(HadoopGeoTiffInfoReader.scala:53) at geotrellis.spark.store.GeoTiffInfoReader.$anonfun$readWindows$1(GeoTiffInfoReader.scala:77) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:302) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1508) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1435) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1499) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1322) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:376) at org.apache.spark.rdd.RDD.iterator(RDD.scala:327) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
here are my function
def pyramid(inputPath:String,outputPath:String)(implicit sc:SparkContext): Unit ={ if (StrUtil.isBlankIfStr(inputPath)||StrUtil.isBlankIfStr(outputPath)){ println(inputPath) println(outputPath) throw new IllegalArgumentException("check the path!") } //read RDD val formatInputPath = SparkFileUtil.correctPath(SparkFileUtil.getFormatPath(inputPath)) val formatOutputPath = SparkFileUtil.correctPath(SparkFileUtil.getFormatPath(outputPath)) //get layerName val layerName = FileUtil.getPrefix(new File(formatInputPath)) val inputRDD:RDD[(ProjectedExtent, MultibandTile)] = sc.hadoopMultibandGeoTiffRDD(formatInputPath) //get metadata val (_, rasterMetaData) = CollectTileLayerMetadata.fromRDD(inputRDD, FloatingLayoutScheme(512)) val tiledRDD: RDD[(SpatialKey, MultibandTile)] = inputRDD .tileToLayout(rasterMetaData.cellType, rasterMetaData.layout, Bilinear) .repartition(100) val layoutScheme = ZoomedLayoutScheme(WebMercator, tileSize = 256) val contextRDD = ContextRDD(tiledRDD,rasterMetaData) val reprojected: TileRDDReprojectMethods[SpatialKey, MultibandTile] = new TileRDDReprojectMethods(contextRDD) val (zoom,reprojectedRDD): (Int, RDD[(SpatialKey, MultibandTile)] with Metadata[TileLayerMetadata[SpatialKey]]) = reprojected.reproject(WebMercator, layoutScheme) val dirOutputPath = formatOutputPath + "/" + layerName val attributeStore = AttributeStore(dirOutputPath) val writer = LayerWriter(dirOutputPath) Pyramid.upLevels(reprojectedRDD, layoutScheme, zoom, Bilinear) { (rdd, z) => val layerId = LayerId(layerName, z) if(attributeStore.layerExists(layerId)) { attributeStore match { case store: HadoopAttributeStore => new HadoopLayerManager(store).delete(layerId) case store: FileAttributeStore => new FileLayerManager(store).delete(layerId) } } writer.write(layerId, rdd, ZCurveKeyIndexMethod) } }
plugin:
`
<artifactId>maven-assembly-plugin</artifactId>
<version>3.2.0</version>
<configuration>
<archive>
<manifest>
<mainClass>com.cjx.geospark.RasterUtils</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
`
@CaiCaiXian yea, it is a packaging issue Caused by: java.lang.NoClassDefFoundError: Could not initialize class geotrellis.raster.io.geotiff.TiffType$
Most likely it fails on circe codecs derivation: https://github.com/locationtech/geotrellis/blob/master/raster/src/main/scala/geotrellis/raster/io/geotiff/TiffType.scala#L32-L40
Check shapeless / circe deps in the classpath!
There is also an example of shading / assembly merge strategy rules in the repo: https://github.com/locationtech/geotrellis/blob/master/project/Settings.scala#L512-L532
There is also an example of shading / assembly merge strategy rules in the repo: https://github.com/locationtech/geotrellis/blob/master/project/Settings.scala#L512-L532
thank u ! I use ‘maven-shade-plugin’ to rename the dependency :cats-kernel_2.12:2.9.0 which conflicts with the dependency on the remote spark cluster named :cats-kernel_2.12:2.1.1 . it works!!
by the way. Do u know how to fix this problem?
` at 'geotrellis':
"define 1 overlapping resource:
@CaiCaiXian the merge strategy for .conf
files should be merge
, I think that's what is happening.
@CaiCaiXian the merge strategy for
.conf
files should bemerge
, I think that's what is happening.
thanks it works , I'm sorry to have been bothering you. this maybe the last question. when i write layer it throwed a error. `Exception in thread "main" geotrellis.store.package$LayerWriteError: Failed to write Layer(name = "3199.00-614.00", zoom = 21) at geotrellis.spark.store.hadoop.HadoopLayerWriter._write(HadoopLayerWriter.scala:122) at geotrellis.spark.store.hadoop.HadoopLayerWriter._write(HadoopLayerWriter.scala:38) at geotrellis.spark.store.LayerWriter.write(LayerWriter.scala:152) at geotrellis.spark.store.LayerWriter.write$(LayerWriter.scala:144) at geotrellis.spark.store.hadoop.HadoopLayerWriter.write(HadoopLayerWriter.scala:38) at com.cjx.geospark.RasterUtils$.$anonfun$pyramid$8(RasterUtils.scala:87) at com.cjx.geospark.RasterUtils$.$anonfun$pyramid$8$adapted(RasterUtils.scala:74) at geotrellis.spark.pyramid.Pyramid$.runLevel$1(Pyramid.scala:337) at geotrellis.spark.pyramid.Pyramid$.upLevels(Pyramid.scala:345) at geotrellis.spark.pyramid.Pyramid$.upLevels(Pyramid.scala:368) at com.cjx.geospark.RasterUtils$.pyramid(RasterUtils.scala:74) at com.cjx.geospark.RasterUtils$.main(RasterUtils.scala:103)
Caused by: java.io.InvalidClassException: geotrellis.layer.TileLayerMetadata; local class incompatible: stream classdesc serialVersionUID = 3142813742075090433, local class serialVersionUID = -468075711590230574 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:527) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2322) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) `
Is this issue related to the JDK version?
local jdk:1.8.0_291 remote spark jdk:1.8.0_352
Hey @CaiCaiXian, took me half a year to reply; in addition to the JDK mismatch it could also be a Scala versions mismatch.
I'll close this issue for now, but don't hesistate to reopen it!
Describe the bug
When I was using hadoopMultibandGeoTiffRDD , he throwed an error on the remote spark cluster : java. lang. ClassCastException: cannot assign instance of java. lang. invoke. SerializedLambda to field org. apache. park. rdd. MapPartitionsRDD. f of type scala. Function3 in instance of org. apache. park. rdd. MapPartitionsRDD, but it worked fine in local[*]
To Reproduce
Provide as able:
Code Example:
val inputRDD:RDD[(ProjectedExtent, MultibandTile)] = sc.hadoopMultibandGeoTiffRDD(formatInputPath)
i use SparkSubmit to submit my jar with client to the remote spark
Expected behavior
I hope it can work fine in the remote spark
Screenshots
If applicable, add screenshots to help explain your problem.
Environment