microsoft / mssql-jdbc

The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database connectivity with SQL Server through the standard JDBC application program interfaces (APIs).
MIT License
1.05k stars 425 forks source link

Bulk copy: fails with combination sendStringParametersAsUnicode=false #1784

Open JDBC-0 opened 2 years ago

JDBC-0 commented 2 years ago

Driver version

10.2.0

SQL Server version

Microsoft SQL Server 2019 (RTM-GDR) (KB4583458) - 15.0.2080.9 (X64) Nov 6 2020 16:50:01 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows Server 2019 Standard 10.0 (Build 17763: ) (Hypervisor)

Client Operating System

Microsoft Windows [Version 10.0.19044.1566]

JAVA/JVM version

11.0.4

Table schema

MyColumnWith3Chars VARCHAR(3), ....

The statement is plain vanilla

"INSERT INTO amt.dbo.MyFancyTable VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"

Problem description

prepared INSERT staments fail with "invalid column length from the bcp client" for no reason.

Expected behavior

Debugged source values with 1 single string (for 1 single VARCHAR column) always having 3 charatcers. Destination table has VARCHAR(3). => should work (and is working if bulk copy is not used when we use our normal table with unsupported DATE/DATETIME columns)

Actual behavior

The error below occurs.

Error message/stack trace

Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen. errorCode=0, message=Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen., SQLState=null, cause=null java.sql.BatchUpdateException: Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen. at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2073)

Any other details that can be helpful

If I increase the table column to VARCHAR(11) the error vanishes - but nothing is saved anyway. I debugged the values and trible checked with a collegue that all values exactly have 3 characters - not a single one has more or less. No idea what else to try. Thinking about coalation, we have connection parameter "sendStringParametersAsUnicode=false" set, so there even should not be a duplication to 6 bytes for 3 characters.

Again: if we revert back the datatypes of our date/time columns, everything (including this first column with VARCHAR(3) works without error and is saved. (As then bulk copy does not kick in, I guess, because of the unsupported DATE/DATETIME columns.)

JDBC trace logs

I enabled driver logging and found this:

164803674 INFO r.ccyCodeRef_I: >>>EUR<<< 164803674 FINER ENTRY 1 EUR [com.microsoft.sqlserver.jdbc.Statement] 164803674 FINER RETURN [com.microsoft.sqlserver.jdbc.Statement]

My output shows "EUR" ... I have no idea if "ENTRY 1 EUR" might be a wrong value as it has exactly 11 characters (what a coincidence ...).

VeryVerySpicy commented 2 years ago

Hello @JDBC-0,

Thanks for reaching out.

I don't suppose you are able to provide the stack trace in English? And perhaps a small java app reproducing the problem?

We will investigate regardless but those would help greatly expediate the process.

JDBC-0 commented 2 years ago

I wrote a minimal test to reproduce this:

A) create the table:

` USE [pubs] GO

/** Object: Table [dbo].[test_MH_JDBC] Script Date: 04.04.2022 11:01:06 **/ SET ANSI_NULLS ON GO

SET QUOTED_IDENTIFIER ON GO

CREATE TABLE [dbo].[test_MH_JDBC]( [Text] varchar NULL ) ON [PRIMARY] GO `

B) and here is the Java application :

` package tst;

import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.util.logging.ConsoleHandler; import java.util.logging.Level; import java.util.logging.Logger; import java.util.logging.SimpleFormatter;

/**

}//Test `

The result is:

Error: java.sql.BatchUpdateException: Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen. java.sql.BatchUpdateException: Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen. at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2073) at tst.Test.testJDBCBulkInsert(Test.java:55) at tst.Test.main(Test.java:25)

(Roughly translated: bcp-Client received an invalid column lngth for column ID 1)

Our environment:

JDBC-0 commented 2 years ago

... and already found the problem: if I remove the URL parameter

sendStringParametersAsUnicode=false;

then the error does not occur! But we have to set sendStringParametersAsUnicode=false because we still have all the older datatypes (VARCHAR and not NVARCHAR) and the standard Windows encoding / collation on the database server (Cp1252, Latin1_General_CI_AS).

I also tested it with the same table but NVARCHAR datatypes: same error occurs.

We use sendStringParametersAsUnicode=false since jTDS driver years ago and the reason might have been to "fix" implicit charatcer conversions between the Java UTF encoding (we use -Dfile.encoding=UTF-8 for misc reasons). I have no idea if this parameter can be omitted (or set to 'true' with SQL Server 2019 and Java 11 ...?

I read here, that the new JDBC 4 method setNString() is available ... the above test code does not show the error anymore if I use setNString() instead of setString() :

ps.setNString(1, "TST");

This is not quite correct as the table column is still VARCHAR and not NVARCHAR. But I don't know if this general change of using setNString() instead of setString() has other side effects in our application. (And it would mean changing thousands of code lines.) But without using the useBulkCopyForBatchInsert=true parameter, the code runs fine, so it has something to do with the combination of bulk copy feature and character encoding / collation:

A) sendStringParametersAsUnicode=false;useBulkCopyForBatchInsert=false ... works B) sendStringParametersAsUnicode=true;useBulkCopyForBatchInsert=true ... works C) sendStringParametersAsUnicode=false;useBulkCopyForBatchInsert=true ... fails

One reason might be, that bulk copy ignores the parameter value for sendStringParametersAsUnicode ?

JDBC-0 commented 2 years ago

I played with the driver specific API to use com.microsoft.sqlserver.jdbc.SQLServerBulkCopy (but we want to stay driver independent, so we just used the standard JDBC API) and this kind of usage seems to work (i.e. no error message and data is inserted):

`

/**
 * Test: insert table data via plain INSERT statements to trigger MS SQL JDBC driver bulk insert feature.
 * 
 * Ensure mssql-jdbc.jre11.jar is in classpath and mssql-jdbc_auth-10.2.0.x64.dll in library path (e.g. -Djava.library.path=lib).
 */
static public void testJDBCBulkInsert(final String dbServerName, final int dbServerPort) throws Exception {
    final Logger msJDBCLogger = Logger.getLogger("com.microsoft.sqlserver.jdbc"); //name of Microsoft JDBC logger
    if (msJDBCLogger != null) {
        msJDBCLogger.setLevel(Level.FINEST); //log all driver traces
    }

    //driver logging to System.out :
    ConsoleHandler handler = new ConsoleHandler();
    handler.setFormatter(new SimpleFormatter());
    handler.setLevel(Level.FINEST);
    msJDBCLogger.addHandler(handler);

    String urlJDBC = "jdbc:sqlserver://"+dbServerName+":"+dbServerPort+";databaseName=amc;integratedSecurity=true;encrypt=false"
            +";sendStringParametersAsUnicode=false"
            +";useBulkCopyForBatchInsert=true"; //fails with sendStringParametersAsUnicode=false !

    Connection c = DriverManager.getConnection(urlJDBC, null);
    String nameOfTable = "pubs.dbo.test_MH_JDBC";

    boolean useDriverAPI = true; //toggle between driver specific and standard JDBC API
    if (useDriverAPI) {
        //MS SQL Server JDBC driver specific API: 
        com.microsoft.sqlserver.jdbc.SQLServerBulkCopy bulk = new com.microsoft.sqlserver.jdbc.SQLServerBulkCopy(c); //with destination connection
        bulk.setDestinationTableName(nameOfTable);

        Statement s = c.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE);
        ResultSet rs = s.executeQuery("SELECT * FROM "+nameOfTable+ " WHERE 1 = 0"); //force ResultSet without data
        rs.moveToInsertRow();
        rs.updateString(1, "Ö-x");
        rs.insertRow();

        rs.moveToInsertRow();
        rs.updateString(1, "Ö-2");
        rs.insertRow();

        bulk.writeToServer(rs);
        bulk.close();
    } else { //standard JDBC API:
        String sql = "INSERT INTO "+nameOfTable+" VALUES (?)";
        PreparedStatement ps = c.prepareStatement(sql);

        ps.clearBatch();

        ps.setString(1, "AÄß");
        //ps.setNString(1, "T1ä");  //1 way to 'fix' the problem

        ps.addBatch();            
        System.out.println("********************************************* executeBatch() ...");
        ps.executeBatch();
    }
}//testJDBCBulkInsert()

`

As you can see, it's total overkill to first create an artificial statement with a database roundtrip to have a ResultSet to add data to. Just wanted to show that with teh same JDBC URL parameters, it works using the driver's concrete SQLServerBulkCopy object instead using the standard JDBC PreparedStatement (which is preferable).

JDBC-0 commented 2 years ago

So, with the workaround to create a separate connection for such bulk inserts with the parameters

...;sendStringParametersAsUnicode=true;useBulkCopyForBatchInsert=true;...

it works. BUT: it is slower ... considerably slower! Für table data with about 10.000 rows, it's almost identical. But for larger data like multiple 100.000 rows, it is about 10% slower than the connection without bulk copy enabled!!!

VeryVerySpicy commented 2 years ago

Hi @JDBC-0, Thanks for reaching out, we will be taking a look at this shortly.

VeryVerySpicy commented 2 years ago

Hi @JDBC-0,

After quite some debugging. The issue currently stands as follows.

When trying to use setString with sendStringParametersAsUnicode=false, the driver will attempt to insert the given data as varchar which is one byte per character. However this string data is in multibyte character format and will get mangled during between setString and executeBatch.

Using either sendStringParameterAsUnicode=true OR setNString will instead insert data as nvarchar which is 2 bytes per character and has enough space to correctly add the string data.

As such unless you can ascertain that your data will be 1 byte per character, you should use either sendStringParametersAsUnicode=true or setNString.

JDBC-0 commented 2 years ago

Of course we can't change all our code from setString to setNString just for this broken bulk feature. I am not really sure, what is really doing - we just made the experience with jTDS that we had to set this parameter to false for all strings to work. I guess this has something to do with the coalation of the database but how is it related? Are there collations that require sendStringParameterAsUnicode=false? Our tables use VARCHAR and not NVARCHAR, so I don't understand why bulk copy has such a behaviour reading your explanation.

VeryVerySpicy commented 2 years ago

I am unsure if there are any encodings that cannot be sent as Unicode. I can't immediately think of why a single byte charset would not be able to be sent with 2 bytes per character but it is not my field of expertise.

Regardless I have a proposed fix to this issue in https://github.com/microsoft/mssql-jdbc/pull/1816, just have to run tests and discuss this with the team before finalizing on the solution.

Could you try and see if the fix works for you @JDBC-0?

VeryVerySpicy commented 2 years ago

Just an update in case you are still following this issue @JDBC-0.

The PR is currently still in progress. It does fix your issue however the team has discussed it and we need quite a bit more testing to confirm it does not disturb other behavior.

We will likely not have time to finish this in our current sprint and it will likely be revisited in a couple of weeks.

JDBC-0 commented 2 years ago

Thnks for the feedback, this is good news.

lilgreenbird commented 2 years ago

hi @JDBC-0 I am unable to repro the issue using your repro code and schema. When I run your repro code I get:

Error: java.sql.BatchUpdateException: The given value of type VARCHAR(3) from the data source cannot be converted to type varchar(1) of the specified target column Text. java.sql.BatchUpdateException: The given value of type VARCHAR(3) from the data source cannot be converted to type varchar(1) of the specified target column Text. at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2138) at github1784.testJDBCBulkInsert(github1784.java:59) at github1784.main(github1784.java:24)

Also I am unsure what the issue is? If you want to send unicode characters you will need to NOT specify sendStringParameterAsUnicode to false.

lilgreenbird commented 2 years ago

Looking at this issue more it involves quite a bit more work than originally thought. We will have to budget more time to investigate how sendStringParametersAsUnicode property works in conjunction with BulkCopy and/or the possibility to add an option to specify encoding. I have added this issue to our backlog it will be considered when we do planning for the next semester. In the meantime, please use sendStringParameterAsUnicode=true to send unicode characters or useBulkCopyForBatchInsert=false to work around the issue. Thanks

JDBC-0 commented 2 years ago

Thanks for the feedback. We try to test if we can ommit this Parameter sendStringParametersAsUnicode now as we use it for years but now have a new JDBC driver (this mssql-jdbc, former: jTDS) and a new SQL Server (now: 2019, former: 2008). But our database ist still old with collation Latin1_General_CI_AS and we use Java 11 (of course UTF-8, Unicode) to use the database. So, I guess, it won't work as the Unice-Characters of Java are incompatible with the database collation.

(For the test code: perhaps Github swalloed / changed some characters when pasting the code. It's jast a simple VARCHAR column - e.g. VARCHAR(3). The comment above with the test code seemed to have swallowed the "(3)" in the table declaration.)

lilgreenbird commented 2 years ago

thanks, I have marked this as an enhancement to specify encoding. This will be considered along with other feature requests and bug fixes when we do planning for the next semester.

JDBC-0 commented 2 years ago

I noticed a comment in our configuration file saying "set to false to avoid performance hits due to UTF-8 conversions". The official driver documentation also says:

"For optimal performance with the CHAR, VARCHAR, and LONGVARCHAR JDBC data types, an application should set the sendStringParametersAsUnicode property to "false" and use the setString, ...."

As we use the old VARCHAR datatype in database and setString() JDBC methods, I guess setting sendStringParametersAsUnicode = true will hurt performance as Java is UTF-8 and the VARCHAR type is nin-UTF-8 ?

Jeffery-Wasty commented 2 years ago

When setting sendStringParametersAsUnicode to true, there is an implicit conversion from VARCHAR to NVARCHAR, this causes a noticeable performance impact. I found this blog post that demonstrates it well.

https://sqlwizardblog.wordpress.com/2018/07/29/performance-impact-of-jdbc-connectionstring-property-sendstringparametersasunicode/

Just an update on this issue: this is still an issue which requires a thorough understanding of sendStringParametersAsUnicode, as well as BulkCopy. In the meantime, continue to use the workarounds mentioned above.

JDBC-0 commented 8 hours ago

Any news here? After we finally upgraded die driver 12.8.1 we still try to use useBulkCopyForBatchInsert=true but either get

"errorCode=4022, message=Massenladedaten wurden erwartet, aber nicht gesendet. Der Batch wird beendet., SQLState=S0001, cause=null" (Mass load data expected, but not sent. Batch will exit.)

or

"Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen." (bcp-client received invalid column length for column ID 1)

fiddling around with teh URL parameters. (Would be nice to be able to tell the driver the language to use - we can't change our system default in the VM as it has to be german for other things.)