locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

Possible cell type serialization bug. #491

Closed metasim closed 4 years ago

metasim commented 4 years ago

Triggered in a classification example when lazyTiles loading is turned off, and cached() is enabled at multiple stages in ABT construction.

Still diagnosing, but here's the backtrace:

``` 20/05/27 09:47:59 ERROR Executor: Exception in task 3.3 in stage 52.0 (TID 2842) java.lang.IllegalArgumentException: Cell type is not supported at geotrellis.raster.CellType$.fromName(CellType.scala:436) at org.locationtech.rasterframes.encoders.StandardSerializers$$anonfun$1.apply(StandardSerializers.scala:344) at org.locationtech.rasterframes.encoders.StandardSerializers$$anonfun$1.apply(StandardSerializers.scala:344) at com.github.blemale.scaffeine.CacheLoaderAdapter.load(CacheLoaderAdapter.java:22) at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:140) at com.github.benmanes.caffeine.cache.UnboundedLocalCache.lambda$computeIfAbsent$2(UnboundedLocalCache.java:238) at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) at com.github.benmanes.caffeine.cache.UnboundedLocalCache.computeIfAbsent(UnboundedLocalCache.java:234) at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108) at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54) at com.github.blemale.scaffeine.LoadingCache.get(LoadingCache.scala:30) at org.locationtech.rasterframes.encoders.StandardSerializers$$anon$6.from(StandardSerializers.scala:129) at org.locationtech.rasterframes.encoders.StandardSerializers$$anon$6.from(StandardSerializers.scala:116) at org.locationtech.rasterframes.encoders.CatalystSerializer$class.fromInternalRow(CatalystSerializer.scala:47) at org.locationtech.rasterframes.encoders.StandardSerializers$$anon$6.fromInternalRow(StandardSerializers.scala:116) at org.locationtech.rasterframes.encoders.CatalystSerializer$WithFromInternalRow$.to$extension(CatalystSerializer.scala:151) at org.locationtech.rasterframes.encoders.CatalystSerializer$CatalystIO$$anon$2.get(CatalystSerializer.scala:126) at org.locationtech.rasterframes.encoders.CatalystSerializer$CatalystIO$$anon$2.get(CatalystSerializer.scala:112) at org.locationtech.rasterframes.model.TileDataContext$$anon$1.from(TileDataContext.scala:54) at org.locationtech.rasterframes.model.TileDataContext$$anon$1.from(TileDataContext.scala:43) at org.locationtech.rasterframes.encoders.CatalystSerializer$class.fromInternalRow(CatalystSerializer.scala:47) at org.locationtech.rasterframes.model.TileDataContext$$anon$1.fromInternalRow(TileDataContext.scala:43) at org.locationtech.rasterframes.encoders.CatalystSerializer$WithFromInternalRow$.to$extension(CatalystSerializer.scala:151) at org.locationtech.rasterframes.encoders.CatalystSerializer$CatalystIO$$anon$2.get(CatalystSerializer.scala:126) at org.locationtech.rasterframes.encoders.CatalystSerializer$CatalystIO$$anon$2.get(CatalystSerializer.scala:112) at org.locationtech.rasterframes.tiles.InternalRowTile.cellContext(InternalRowTile.scala:48) at org.locationtech.rasterframes.tiles.InternalRowTile.realizedTile$lzycompute(InternalRowTile.scala:43) at org.locationtech.rasterframes.tiles.InternalRowTile.realizedTile(InternalRowTile.scala:43) at org.apache.spark.sql.rf.TileUDT$$anonfun$deserialize$2.apply(TileUDT.scala:60) at org.apache.spark.sql.rf.TileUDT$$anonfun$deserialize$2.apply(TileUDT.scala:59) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.rf.TileUDT.deserialize(TileUDT.scala:59) at astraea.earthai.rasterml.expressions.VectorizeTilesExpression.eval(VectorizeTilesExpression.scala:62) at org.apache.spark.sql.execution.GenerateExec$$anonfun$1$$anonfun$5.apply(GenerateExec.scala:108) at org.apache.spark.sql.execution.GenerateExec$$anonfun$1$$anonfun$5.apply(GenerateExec.scala:107) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:212) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ```
metasim commented 4 years ago

Issue was with custom expression not handling proj_raster structs, not in RasterFrames.