microsoft / sql-spark-connector

Apache Spark Connector for SQL Server and Azure SQL
Apache License 2.0
273 stars 116 forks source link

writing with mode "append" to an existing table only rolls back faulty rows w/o "NO_DUPLICATES" #236

Open m-freitag opened 1 year ago

m-freitag commented 1 year ago

Using the latest beta version of the connector, issuing an "append" to an existing table only properly rolls back rows after an error in a row occurs. Setting reliabilityLevel to "NO_DUPLICATES" works. However, the Error message raised in this case is intrackable.

Take an arbitrary HEAP table with some constraint (e.g. primary key) or non-nullability set causes all rows rows up to a faulty row (e.g. Null) to be inserted:

df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("append") \
    .option("url", "url") \
    .option("dbtable", "table") \
    .option("schemaCheckEnabled", False) \
    .option("tableLock", True) \
    .option("BatchSize", 1000) \
    .save()