projectglow / glow

An open-source toolkit for large-scale genomic analysis
https://projectglow.io
Apache License 2.0
266 stars 111 forks source link

upgrade spark version from 3.1 to 3.2 #481

Closed JassAbidi closed 2 years ago

JassAbidi commented 2 years ago

this PR upgrades the spark version from 3.1 to 3.2.

the already existing tests are used to test this patch. the code should have the same exact old behavior.

williambrandler commented 2 years ago

hey @JassAbidi thanks for this

The tests are failing for two reasons,

  1. Please sign off all commits with --signoff
  2. Please format code with scalafmt
[error] (core / Compile / scalafmtCheck) 13 files must be formatted
[error] (core / Test / scalafmtCheck) 1 files must be formatted
[error] Total time: 11 s, completed Feb 4, 2022 5:42:56 PM

Exited with code exit status 1
williambrandler commented 2 years ago

hey @JassAbidi we are now getting:

[error] Uncaught exception when running io.projectglow.sql.SingleFileWriterSuite: java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.<init>(ZIIIIIIZ)V
[error] sbt.ForkMain$ForkError: java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.<init>(ZIIIIIIZ)V

on the tests

which from googling on stack overflow appears to relate to the versions of netty in build.sbt

https://github.com/projectglow/glow/blob/aa96a6c99445192dd9fe68926bf60c660477d767/build.sbt#L173-L174

I think we need to bump to

io.netty, netty-all, 4.1.68.Final ?

williambrandler commented 2 years ago

In the Spark 2 tests we are now getting

[error] /home/circleci/glow/core/src/main/scala/io/projectglow/sql/expressions/glueExpressions.scala:36:43: not enough arguments for constructor UnresolvedException: (tree: TreeType, function: String)org.apache.spark.sql.catalyst.analysis.UnresolvedException[TreeType]. [error] Unspecified value parameter function. [error] override def dataType: DataType = throw new UnresolvedException("dataType")

I wonder if we should completely remove support for Spark 2? Please speak to the spark-core engineers to understand what to do about this

bcajes commented 2 years ago

I suspect this change will fail on the installHail build step. Hail is not yet on Spark 3.2 , though there is a PR for this change that seems to be close to completion https://github.com/hail-is/hail/pull/11410

williambrandler commented 2 years ago

We do set Hail tests to Spark 3.2.0 and should not do that in this PR @bcajes https://github.com/projectglow/glow/blob/master/.circleci/config.yml#L253-L262

bcajes commented 2 years ago

re: netty errors, upgrading netty-all to 4.1.68 still results in other netty related errors

I think I tracked down the other problematic netty libs with sbt core/dependencyTree

The following changes in build.sbt seems to get it to compile and pass scala core tests:

"io.netty" % "netty-all" % "4.1.68.Final", "io.netty" % "netty-handler" % "4.1.68.Final", "io.netty" % "netty-transport-native-epoll" % "4.1.68.Final",

williambrandler commented 2 years ago

re: netty errors, upgrading netty-all to 4.1.68 still results in other netty related errors

I think I tracked down the other problematic netty libs with sbt core/dependencyTree

The following changes in build.sbt seems to get it to compile and pass scala core tests:

"io.netty" % "netty-all" % "4.1.68.Final", "io.netty" % "netty-handler" % "4.1.68.Final", "io.netty" % "netty-transport-native-epoll" % "4.1.68.Final",

Thanks @bcajes! I tried bumping netty-all by itself and was banging my head against the computer until I saw your comment. OK we're getting closer.

Spark 2 tests are failing due to backwards compatibility, adding a shim (eg. glow/SparkShim.scala at master · projectglow/glow ) will fix this

williambrandler commented 2 years ago

spark 3 tests are now passing with Hail on 3.1.2 @bcajes, next step is to move the code into a shim so Spark 2 tests also pass

williambrandler commented 2 years ago

closing this, we will use this pull request for glow on spark 3.2 https://github.com/projectglow/glow/pull/501/files