microsoft / hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
https://aka.ms/hyperspace
Apache License 2.0
424 stars 115 forks source link

Failed to debug Scala Test in IntelliJ #498

Open baibaichen opened 3 years ago

baibaichen commented 3 years ago

Describe the issue

It's ok to run test in sbt, but failed to run/debug scala test in IntelliJ.

To Reproduce

  1. Import into intelliJ as describled in readme
  2. Pick certain test (e.g. DataFrameWriterExtensionsTest), run it

Get the following exception

An exception or error caused a run to abort: 'org.apache.parquet.hadoop.ParquetOutputFormat$JobSummaryLevel org.apache.parquet.hadoop.ParquetOutputFormat.getJobSummaryLevel(org.apache.hadoop.conf.Configuration)' 
java.lang.NoSuchMethodError: 'org.apache.parquet.hadoop.ParquetOutputFormat$JobSummaryLevel org.apache.parquet.hadoop.ParquetOutputFormat.getJobSummaryLevel(org.apache.hadoop.conf.Configuration)'
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.prepareWrite(ParquetFileFormat.scala:130)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:133)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
    at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
    at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
    at com.microsoft.hyperspace.index.DataFrameWriterExtensionsTest.beforeAll(DataFrameWriterExtensionsTest.scala:53)
    at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
    at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
    at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
    at com.microsoft.hyperspace.index.DataFrameWriterExtensionsTest.org$scalatest$BeforeAndAfter$$super$run(DataFrameWriterExtensionsTest.scala:35)
    at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:273)
    at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:271)
    at com.microsoft.hyperspace.index.DataFrameWriterExtensionsTest.run(DataFrameWriterExtensionsTest.scala:35)
    at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
    at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
    at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
    at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
    at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
    at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
    at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
    at org.scalatest.tools.Runner$.run(Runner.scala:798)
    at org.scalatest.tools.Runner.run(Runner.scala)
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:25)

I am not familiar with sbt, but I think it could be fixed by excluding parquet-hadoop-bundle-1.8.1.jar which is test scope.

Expected behavior

we can debug the terts in intelliJ

Environment

IntelliJ IDEA 2021.2.2 (Ultimate Edition)

sezruby commented 3 years ago

Is the test for spark3?

Could you try the instruction in https://github.com/microsoft/hyperspace/pull/478/files?