microsoft / sql-spark-connector

Apache Spark Connector for SQL Server and Azure SQL
Apache License 2.0
273 stars 118 forks source link

Upgrade to Spark 3.3.0 #197

Closed moredatapls closed 1 year ago

moredatapls commented 2 years ago

Fixes #191

ghost commented 2 years ago

CLA assistant check
All CLA requirements met.

moredatapls commented 2 years ago

If anyone finds this PR and wants support for Spark 3.3.0: head over to https://github.com/solytic/sql-spark-connector/releases/tag/v1.4.0 and use the build that we created at Solytic, since Microsoft seems to not be so active here

arihar268 commented 1 year ago

Even after the fix facing the below issue with Spark version 3.3 and scala version 2.12.15. Included the dependent libraries in build.sbt running on databricks runtime cluster 11.3 LTS

java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(Ljava/sql/ResultSet;Lorg/apache/spark/sql/jdbc/JdbcDialect;Z)Lorg/apache/spark/sql/types/StructType; at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.matchSchemas(BulkCopyUtils.scala:306) at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.getColMetaData(BulkCopyUtils.scala:267) at com.microsoft.sqlserver.jdbc.spark.Connector.write(Connector.scala:79)

Provided in build.sbt

name := "spark-mssql-connector"

organization := "com.microsoft.sqlserver.jdbc.spark"

version := "1.0.0"

scalaVersion := "2.12.15"

val sparkVersion = "3.3.0"

javacOptions ++= Seq("-source", "1.8", "-target", "1.8", "-Xlint")

initialize := { val _ = initialize.value val javaVersion = sys.props("java.specification.version") if (javaVersion != "1.8") sys.error("Java 1.8 is required for this project. Found " + javaVersion + " instead") }

scalacOptions := Seq("-deprecation", "-unchecked", "-Dscalac.patmat.analysisBudget=1024", "-Xfuture")

libraryDependencies ++= Seq( "com.microsoft.sqlserver" % "mssql-jdbc" % "8.4.1.jre8", "org.apache.spark" %% "spark-parent" % "3.3.0" % "provided", "org.scala-lang.modules" %% "scala-parser-combinators" % "1.1.2", "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", "org.apache.spark" %% "spark-yarn" % sparkVersion, "org.scala-lang" % "scala-library" % "2.12.11" % "test", "org.apache.spark" %% "spark-core" % sparkVersion % "provided", "org.apache.spark" %% "spark-catalyst" % sparkVersion % "provided", "org.scalactic" %% "scalactic" % "3.2.6" % "test", "org.scalatest" %% "scalatest" % "3.2.6" % "test", "com.novocode" % "junit-interface" % "0.11" )

scalacOptions := Seq("-unchecked", "-deprecation", "evicted")

assemblyJarName in assembly := s"${name.value}${scalaBinaryVersion.value}-${sparkVersion}${version.value}.jar"

// Exclude scala-library from this fat jar. The scala library is already there in spark package. assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)

assemblyMergeStrategy in assembly := { case PathList("META-INF",xs @ *) => MergeStrategy.discard case => MergeStrategy.first }

Am I missing anythings?

moredatapls commented 1 year ago

@arihar268 the project only contains a pom.xml, not a build.sbt. Where did you find this file?

If you want to upgrade the Spark and Scala version you need to do it there. See also the changed file in the PR: https://github.com/microsoft/sql-spark-connector/pull/197/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8

shivsood commented 1 year ago

@moredatapls thansk for PR. Please consider splitting these as 2 PRs. 1 for jdbc connection change,

  1. for adding the sparkfun based test framework./docker change to spin up the sql server.

Also did u run the regression test on spark 3.3.0 with this. If so, could you please add the test results here. Thanks again for work

@luxu-ms as FYI.

arihar268 commented 1 year ago

@moredatapls , I am using the same code which is in the PR. But I have build jar with pom.xml as well facing the similar issue in Databricks with runtime 11.3 LTS

tfabritz commented 1 year ago

I can see the same error that @arihar268 is seeing when using the build provided over at Solytic: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(Ljava/sql/ResultSet;Lorg/apache/spark/sql/jdbc/JdbcDialect;Z)Lorg/apache/spark/sql/types/StructType;

It only is happening when using truncate as write mode.

Could this be the case because StructType is no longer imported in src/main/scala/com/microsoft/sqlserver/jdbc/spark/utils/BulkCopyUtils.scala?

dimtsi commented 1 year ago

Is there any update on this? It is also a blocker for me and my colleagues to move to DBR 11.3

syedhassaanahmed commented 1 year ago

Its been 7 months since Spark 3.3 was released. @shivsood @luxu1-ms Any update on getting this PR merged?

doctorvanmartin commented 1 year ago

Please! This is mandatory for DBR 11.3 and unity catalog! Thank you for your work!

admo1 commented 1 year ago

Any update on this ?

moredatapls commented 1 year ago

@luxu1-ms @shivsood please check again

luxu1-ms commented 1 year ago

@luxu1-ms @shivsood please check again

@moredatapls Sorry closed this PR by accident. Thank you so much for the contribution and the updates. I reviewd this PR and left one comment. Please let me know if you have any opinions, then this PR will be good to go! We plan to have a new release very soon.

moredatapls commented 1 year ago

Please ensure test runs and attach results to PR

@shivsood what do you mean by this? doesn't one of you need to trigger the tests in CI? I can't do it

image
shivsood commented 1 year ago

Merging. Test pass by @luxu1-ms

hmayer1980 commented 1 year ago

And when do we expect to see a final Version on Maven? https://mvnrepository.com/artifact/com.microsoft.azure/spark-mssql-connector_2.12

image But I just get an Error it does not exist in Databricks. Currently I only see a 1.3.0-BETA Version? Only this BETA Version Works on Maven yet.

smialkow commented 1 year ago

Is there any update on the final release of 1.3? I am not even considering using a beta version for production.