TCP/IP connection to the host whatever.database.windows.net, port 1433 has failed. Error: "null. Verify the connection properties.

dbeavon commented 11 months ago

I have an intermittent issue using this connector in a Spark environment in Synapse.

The exception appears to be coming out of the jdbc driver but I'm hoping this community may be able to recognize the problem (now or in the future). Especially if there are factors of my scenario that might be Spark-specific.

The error message is like so :

com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host whatever.database.windows.net, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".

It happens when bulk-copying data up into an Azure Sql database. Here is the full stack as well:

2023-10-01 21:56:52,857 ERROR Executor [Executor task launch worker for task 3.0 in stage 212.0 (TID 23861)]: Exception in task 3.0 in stage 212.0 (TID 23861)
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host whatever.database.windows.net, port 1433 has failed. Error: "null. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:234)
    at com.microsoft.sqlserver.jdbc.SQLServerException.ConvertConnectExceptionToSQLServerException(SQLServerException.java:285)
    at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2462)
    at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:668)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2695)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2362)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2213)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1276)
    at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:861)
    at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
    at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.create(ConnectionProvider.scala:77)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:62)
    at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.savePartition(BulkCopyUtils.scala:50)
    at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2(BestEffortSingleInstanceStrategy.scala:43)
    at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$.$anonfun$write$2$adapted(BestEffortSingleInstanceStrategy.scala:42)
    at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1027)
    at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1027)
    at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2349)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

It is only an intermittent problem and only happens once in a hundred batches (or so). But I would still like to understand the root cause and fix it permanently. I think it is odd that I would be having a problem in a method called "findSocket". Is my Spark node (a synapse vm) exhausting all the TCP/IP sockets that might otherwise be used for connecting to SQL?

The Synapse Spark cluster itself may be one of the factors involved in this problem. My nodes are running on their proprietary network technology (they call it a "managed vnet"). I've had networking problems with the Synapse Spark offering in the past. Ideally I would be able to find a way to proactively monitor for the network resource that eventually creates this failure.

Any tips would be very much appreciated.

dbeavon commented 11 months ago

I believe this is a known bug that may have been fixed in another version of the JDBC driver.

I think I found the same issue reported by someone else, and tracked down a fix. The fix might be in a recent version of the jdbc driver. https://github.com/microsoft/mssql-jdbc/issues/1831

The fix might be in : [12.4.0] - A Stable Release

This fix appears to have been introduced as recently as June 2023.
I'm not that familiar with the Java ecosystem. When the README for this project says you support the 8.4.1 version of the driver, does that mean we should avoid all the more recent ones? I found the following jdbc in maven :

https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc/12.4.0.jre8

... I'm wondering if this jdbc is something that I can use, albeit at my own risk? Why does the README not mention the other versions of the jdbc driver? The one it does mention is many years old. Any information would be greatly appreciated.

dbeavon commented 10 months ago

I might just close this now. I will probably try to find another path forward.

dbeavon commented 10 months ago

Closing for now. Will investigate workarounds.

microsoft / sql-spark-connector

TCP/IP connection to the host whatever.database.windows.net, port 1433 has failed. Error: "null. Verify the connection properties. #241