microsoft / mssql-jdbc

The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database connectivity with SQL Server through the standard JDBC application program interfaces (APIs).
MIT License
1.06k stars 424 forks source link

For bulk insert, failure in later batch isn't reported to the caller #2347

Closed hlms closed 6 months ago

hlms commented 7 months ago

Driver version

Provide the JDBC driver version: mssql-jdbc.12.4.2.jre11.jar

SQL Server version

Provide the output of executing SELECT @@VERSION on your target SQL Server:

Microsoft Azure SQL Edge Developer (RTM) - 16.0.5100.7245 (X64)   Jun 13 2023 12:34:33   Copyright (C) 2022 Microsoft Corporation  Linux (Ubuntu 20.04.6 LTS) <X64>

SQL server is running inside docker 4.27.2.

Client Operating System

Provide the Operating System the client application is running on: Windows 11 enterprise

JAVA/JVM version

Provide the JAVA/JVM version (e.g. java version "1.8.0").

sh-5.2# java -version
openjdk version "11.0.22" 2024-01-16 LTS
OpenJDK Runtime Environment Corretto-11.0.22.7.1 (build 11.0.22+7-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.22.7.1 (build 11.0.22+7-LTS, mixed mode)

Table schema

Provide the table schema to repro the issue:

CREATE TABLE ST_Orders ([OrderID$] BIGINT,[CustomerID$] NVARCHAR(5),[EmployeeID$] BIGINT,[OrderDate$] DATE,[RequiredDate$] DATE,[ShippedDate$] DATE,[ShipVia$] BIGINT,[Freight$] FLOAT,[ShipName$] NVARCHAR(34),[ShipAddress$] NVARCHAR(46),[ShipCity$] NVARCHAR(15),[ShipRegion$] NVARCHAR(13),[ShipPostalCode$] NVARCHAR(9),[ShipCountry$] NVARCHAR(11),[DW_DM_ORDERS_ORDERS_CKSUM$] INTEGER)

CREATE CLUSTERED COLUMNSTORE INDEX [CS_ST_Orders] ON ST_Orders

ALTER TABLE [ST_Orders]
ADD UNIQUE (OrderID$);

INSERT INTO [ST_Orders] (OrderID$)
VALUES (11065);

Problem description

With bulk insert command execution through jdbc driver, when batchsize is used, if insertion for any batch (probably second batch onwards) fails, the jdbc driver can't detect the failure. It responds as success to the caller.

Steps to reproduce the issue

  1. As mentioned above, create a table, add unique constraint, and add one row manually. (FYI: With the help of unique constraint and already added data, we would get bulk insert fail.)
  2. Use the following files: Format.txt Orders.txt Notice that in Orders.txt, the last record has 11065 as OrderID$, so bulk insert would fail due to unique constraint. 11065 was already added manually above. That's why.
  3. Use following code:
    
    String bulkInsertQuery = "BULK INSERT ST_Orders FROM '/.../data/Orders.txt' WITH (FORMATFILE='/.../data/Format.txt',FIRSTROW=1,ROWS_PER_BATCH=830,BATCHSIZE=400,DATAFILETYPE='widechar')";

// Notice that aforementioned query has BATCHSIZE=400, and the number of records are 830 in the file. So it will execute in three batches: 400 + 400 + 30.

st.executeLargeUpdate(bulkInsertQuery);


## Expected behavior
A clear and concise description of what you expected to happen:
- Insertion of the first two batches should succeed.
- Insertion of the third batch should fail.
- The execution should throw exception for the failure.

## Actual behavior
Output of what you actually see.
The execution doesn't throw any exception even though the third batch fails. (First two batches are inserted as expected.)

## Error message/stack trace
No exception or error message. That's the issue.

## Any other details that can be helpful
Add any other context about the problem here.
When we attempt to do the same in SQL Server management studio, it does display the errors:
<img width="960" alt="bulk-insert-failure-in-studio-github" src="https://github.com/microsoft/mssql-jdbc/assets/10277817/7a6ee78d-46b0-4fe7-91d3-a3a0ace11d3f">
### Output:
```sql
(400 rows affected)

(400 rows affected)
Msg 2627, Level 14, State 1, Line 19
Violation of UNIQUE KEY constraint 'UQ__ST_Order__E65CA761A9B383C1'. Cannot insert duplicate key in object 'S_bcabbbbf_7118_48dc_a0e0_bbd075ce01de.ST_Orders'. The duplicate key value is (11065).
The statement has been terminated.

So looks like the driver just reads the first message of response and NOT the further messages (response from SQL server).

I haven't yet tried failing the first batch itself.

JDBC trace logs

Provide the JDBC driver trace logs. Instructions can be found here: https://docs.microsoft.com/sql/connect/jdbc/tracing-driver-operation

hlms commented 7 months ago

Adding some more detail:

st.executeLargeUpdate(bulkInsertQuery) returns a long value.

Attempted to check the behaviour of this when the first batch fails, second batch fails and when the third batch fails.

First batch fails

When the first batch itself fails, the statement throws exception as expected:

2024-03-14 14:18:22 2024-03-14 08:48:22,912+0000 [toe=acgubk80vow677] [Pool Worker - 55] [,,] ERROR - Violation of UNIQUE KEY constraint 'UQ__ST_Order__E65CA761D1FED3F3'. Cannot insert duplicate key in object 'S_bcabbbbf_7118_48dc_a0e0_bbd075ce01de.ST_Orders'. The duplicate key value is (11065). com.microsoft.sqlserver.jdbc.SQLServerException: Violation of UNIQUE KEY constraint 'UQ__ST_Order__E65CA761D1FED3F3'. Cannot insert duplicate key in object 'S_bcabbbbf_7118_48dc_a0e0_bbd075ce01de.ST_Orders'. The duplicate key value is (11065).
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:259) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:304) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:137) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1691) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:920) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:814) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7675) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4137) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:272) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:246) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeLargeUpdate(SQLServerStatement.java:774) ~[mssql-jdbc-12.4.2.jre11.jar:?]

When the second batch or third batch fails

st.executeLargeUpdate(bulkInsertQuery) returns 400 (size of first batch). No exception is thrown.

Successful case

Question

When does jdbc driver return control to the caller / java application?

Jeffery-Wasty commented 7 months ago

Hi @hlms,

I'm not familiar with expected behavior of the driver in this scenario. We'll investigate further and get back to you with an answer/resolution.

Jeffery-Wasty commented 6 months ago

This is actually a common scenario and we have wiki/MS doc pages for it: https://docs.microsoft.com/en-us/sql/connect/jdbc/parsing-the-results?view=sql-server-ver16 https://github.com/microsoft/mssql-jdbc/wiki/Handling-SQLExceptions

You can also see previous issues - #367, #399, #826, #937, #995, #1171

Therefore, this is expected driver behavior. Please let us know if you have any further questions, if not we will be closing the issue.

hlms commented 6 months ago

I'm yet to go through the provided links in detail.

What about introducing a new argument (e.g. RAISE_FAILURES) for Bulk insert (similar to BATCHSIZE argument)? So that if users want bulk insert to surface failure in insertion of any of the batches, then based on this parameter, the driver would raise failures.

The point is - SQL server raises the errors. Shouldn't the driver pass the response as-is instead of swallowing something?