oap-project / sql-ds-cache

Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
Apache License 2.0
37 stars 25 forks source link

The function of vmem-cache and guava-cache should not be associated with arrow. #190

Closed haojinIntel closed 3 years ago

haojinIntel commented 3 years ago

Thanks to the commit 6bc51cea2690cef9ee5f2d6ac15741492bc474af, we should add arrow-java.jar when using vmem-cache or guava-cache. The error messages are showed below when not enabling arrow-java.jar:


2021-08-01 11:17:20,382 WARN scheduler.TaskSetManager: Lost task 190.0 in stage 0.0 (TID 190) (vsr420 executor 6): java.lang.NoClassDefFoundError: org/apache/arrow/plasma/exceptions/PlasmaClientException
        at org.apache.spark.sql.execution.datasources.oap.filecache.FiberCacheManager.<init>(FiberCacheManager.scala:96)
        at org.apache.spark.sql.oap.OapExecutorRuntime.<init>(OapRuntime.scala:108)
        at org.apache.spark.sql.oap.OapRuntime$.init(OapRuntime.scala:153)
        at org.apache.spark.sql.oap.OapRuntime$.init(OapRuntime.scala:141)
        at org.apache.spark.sql.oap.OapRuntime$.getOrCreate(OapRuntime.scala:134)
        at org.apache.spark.sql.execution.datasources.oap.index.BTreeIndexRecordReader.getBTreeFiberCache(BTreeIndexRecordReader.scala:91)
        at org.apache.spark.sql.execution.datasources.oap.index.BTreeIndexRecordReaderV1.readBTreeFooter(BTreeIndexRecordReaderV1.scala:59)
        at org.apache.spark.sql.execution.datasources.oap.index.BTreeIndexRecordReaderV1.initializeReader(BTreeIndexRecordReaderV1.scala:50)
        at org.apache.spark.sql.execution.datasources.oap.index.BTreeIndexRecordReader.analyzeStatistics(BTreeIndexRecordReader.scala:139)
        at org.apache.spark.sql.execution.datasources.oap.index.BPlusTreeScanner.analyzeStatistics(BPlusTreeScanner.scala:57)
        at org.apache.spark.sql.execution.datasources.oap.index.IndexScanner.analysisResByStatistics(IndexScanner.scala:135)
        at org.apache.spark.sql.execution.datasources.oap.index.IndexScanner.analysisResByPolicies(IndexScanner.scala:100)
        at org.apache.spark.sql.execution.datasources.oap.index.IndexScanners.$anonfun$isIndexFileBeneficial$1(IndexScanner.scala:332)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.map(TraversableLike.scala:238)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at org.apache.spark.sql.execution.datasources.oap.index.IndexScanners.isIndexFileBeneficial(IndexScanner.scala:332)
        at org.apache.spark.sql.execution.datasources.oap.io.OapDataReaderV1.initialize(OapDataReaderWriter.scala:94)
        at org.apache.spark.sql.execution.datasources.oap.io.OapDataReaderV1.read(OapDataReaderWriter.scala:150)
        at org.apache.spark.sql.execution.datasources.oap.OptimizedOrcFileFormat.$anonfun$buildReaderWithPartitionValues$5(OptimizedOrcFileFormat.scala:145)
        at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
        at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
        at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
        at org.apache.spark.sql.execution.OapFileSourceScanExec$$anon$1.hasNext(OapFileSourceScanExec.scala:393)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1012)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1012)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.arrow.plasma.exceptions.PlasmaClientException
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 48 more
haojinIntel commented 3 years ago

@yma11 @winningsix @zhixingheyi-tian Please help to track the issue. Thanks!