microsoft / sql-spark-connector

Apache Spark Connector for SQL Server and Azure SQL
Apache License 2.0
274 stars 116 forks source link

sql-spark-connector slower than older azure-sqldb-spark connector #206

Closed kkhambadkone closed 1 year ago

kkhambadkone commented 1 year ago

Ran the test with the same file 8.5GB with old connector https://github.com/Azure/azure-sqldb-spark and this one sql-spark-connector. Used the very same parameters bulkcopy , batchsize 150000, tablelock true for both. With the old connector (spark 2.4.0, scala code) job finished in 16m and with this connector, job took around 38m to finish with spark 3.3.1 and pyspark code. What could be the reason?

shivsood commented 1 year ago

I dont see a reason why their performance should be very different ( assuming minimal spark framework additional cost). Investigate load and response time for sql server to find where the bottleneck may be. Also note that old connector does not use the spark jdbc interfaces and thus not compatible, and meeting perf of old connector is a not our goal. S