microsoft / sql-spark-connector

Apache Spark Connector for SQL Server and Azure SQL
Apache License 2.0
273 stars 116 forks source link

1.3.0-BETA Issue: PKIX path building failed #219

Closed bharadwaj-v closed 1 year ago

bharadwaj-v commented 1 year ago

Environment details: Databricks 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) Spark SQL connector com.microsoft.azure:spark-mssql-connector_2.12:1.3.0-BETA

We are planning to switch to 1.3.0-BETA for Databricks 12.2 LTS and as part of our testing we found the below issue while reading data from SQL server. We did not have this issue while using 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12) and com.microsoft.azure:spark-mssql-connector_2.12:1.3.0-BETA

Py4JJavaError: An error occurred while calling o522.load. : com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error: "PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target". at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:3806) at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1906) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:3329) at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2950) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2790) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1663) at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1064) at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49) at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProviderBase.create(ConnectionProvider.scala:102) at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1(JdbcDialects.scala:123) at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1$adapted(JdbcDialects.scala:119) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:63) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:241) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:39) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:382) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:378) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:334) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:334) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195) at py4j.ClientServerConnection.run(ClientServerConnection.java:115) at java.lang.Thread.run(Thread.java:750) Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alert.createSSLException(Alert.java:131) at sun.security.ssl.TransportContext.fatal(TransportContext.java:348) at sun.security.ssl.TransportContext.fatal(TransportContext.java:291) at sun.security.ssl.TransportContext.fatal(TransportContext.java:286) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369) at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377) at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444) at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:422) at sun.security.ssl.TransportContext.dispatch(TransportContext.java:182) at sun.security.ssl.SSLTransport.decode(SSLTransport.java:156) at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1423) at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1329) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:444) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:415) at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1795) ... 31 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:456) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:323) at sun.security.validator.Validator.validate(Validator.java:271) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:315) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:234) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:110) at com.microsoft.sqlserver.jdbc.HostNameOverrideX509TrustManager.checkServerTrusted(SQLServerTrustManager.java:86) at sun.security.ssl.AbstractTrustManagerWrapper.checkServerTrusted(SSLContextImpl.java:1258) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:638) ... 43 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:451) ... 51 more

luxu1-ms commented 1 year ago

I tested on DBR 12.2 but did not see any issues. Could you provide the repro if the error persists.

bharadwaj-v commented 1 year ago

Hi @luxu1-ms

I am still seeing the same error. Below is a code snippet that we use to read data from SQL Server from databricks.

df = spark.read \ .format("com.microsoft.sqlserver.jdbc.spark") \ .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \ .option("url", f"jdbc:sqlserver://{sqlHost}:{port};databaseName={databaseName}") \ .option("dbtable",f"{dbtable}") \ .option("user", f"{user}") \ .option("password",f"{password}").load()

Databricks 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) Spark SQL connector com.microsoft.azure:spark-mssql-connector_2.12:1.3.0-BETA SQL Server Managed Instance

luxu1-ms commented 1 year ago

It seems the driver issuse. Could you try without .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")

bharadwaj-v commented 1 year ago

Tried it without driver option and the issue persists. One more observation: I think issue is with the Managed Instance. There is no issue with Azure SQL Database and the above code works well.

luxu1-ms commented 1 year ago

Thank you for the information. The connector is primarily for DBC and the commitor group does not provide support for Managed Instance.

bharadwaj-v commented 1 year ago

@luxu1-ms We did not have this issue with the previous versions of the spark or sql-spark-connector for Managed Instance. The same com.microsoft.azure:spark-mssql-connector_2.12:1.3.0-BETA version worked perfectly fine in 11.3 LTS for both Azure SQL Database and Managed Instance. What other workarounds do we have have for this?

luxu1-ms commented 1 year ago

I do not aware of any workarounds. From what you mentioned, I would recommend you to contact Databricks. It seems they changed some of the settings for DBR 12.2 LTS.