microsoft / mssql-jdbc

The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database connectivity with SQL Server through the standard JDBC application program interfaces (APIs).
MIT License
1.06k stars 428 forks source link

xa.commit with one-phase-optimization fails XAER_PROTO after MS JDBC driver indicates XA_READONLY for other transaction branch when using SSTRANSTIGHTLYCPLD #1551

Open njr-11 opened 3 years ago

njr-11 commented 3 years ago

Driver version

9.2.1.0

SQL Server version

15.00.4123

Client Operating System

Mac OS X (10.15.7; amd64) (en_US)

JAVA/JVM version

java.version = 11.0.2 java.runtime = OpenJDK Runtime Environment (11.0.2+9)

Table schema

CREATE TABLE MYTABLE (ID SMALLINT NOT NULL PRIMARY KEY, STRVAL NVARCHAR(40))

Problem description

I'm working on adding support for tightly/loosely coupled transaction branches to OpenLiberty, which we have working for a couple of other JDBC drivers. Unfortunately tightly coupled is not working with the Microsoft SQL Server JDBC Driver due to a bug.

When using the SQLServerXAResource.SSTRANSTIGHTLYCPLD xa start flag for tightly coupled transaction branches, where our test case makes updates on both transaction branches, the JDBC driver's XAResource.prepare votes XA_RDONLY from the second transaction branch. Although it seems odd to indicate read only when an update was performed, this is probably okay to do if your driver implementation is internally consolidating the updates to one of the branches. In response to the XA_RDONLY on that branch, the Liberty transaction manager issues XAResource.commit with the one-phase optimization for the remaining transaction branch. The problem is that the Microsoft JDBC driver fails the one-phase optimized commit with -6 (XAER_PROTO), making the SSTRANSTIGHTLYCPLD path busted.

  1. Expected behaviour: The Microsoft JDBC driver should either return XA_OK from XAResource.prepare in above scenario OR if you do return XA_RDONLY then Microsoft JDBC driver should honor the one-phase optimized commit on the second branch.

  2. Actual behaviour: The Microsoft JDBC driver returns XA_RDONLY from XAResource.prepare for a branch upon which updates were made, and, having voted XA_RDONLY on that branch, subsequently rejects XAResource.commit(onephase) on the other branch.

  3. Error message/stack trace:

    javax.transaction.xa.XAException: The function COMMIT: failed. The status is: -6. Error: "*** SQL_XA DTC_ERROR Context: xa_commit_entry, StatusCode=-6 (0xFFFFFFFA)"
        at com.microsoft.sqlserver.jdbc.SQLServerXAResource.DTC_XA_Interface(SQLServerXAResource.java:662)
        at com.microsoft.sqlserver.jdbc.SQLServerXAResource.commit(SQLServerXAResource.java:807)
        ...
  4. Any other details that can be helpful: see comments interspersed within the following section

JDBC trace logs

OpenLiberty trace captured the exact sequence of JDBC operations leading up to the error.

==> XAConnection219c32a1.getXAResource();
<== XAResource5aadfc27

==> Connection5b41f93c.getTransactionIsolation();
<== 4

==> Connection5b41f93c.getTransactionIsolation();
<== 4

==> Connection5b41f93c.clearWarnings();
<== clearWarnings

==> XAResource5aadfc27.start({XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54), data(00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e000000010000000000000000000000000001)}, 32768);
<== start

==> Connection5b41f93c.createStatement(1003, 1007);
<== Statementa0c1b556

==> Statementa0c1b556.executeUpdate("INSERT INTO MYTABLE VALUES (32, 'thirty-two')");
<== 1

==> XAConnectionef960d1a.getXAResource();
<== XAResource7b5716db

==> Connectiona36df11c.getTransactionIsolation();
<== 4

==> Connectiona36df11c.getTransactionIsolation();
<== 4

==> Connectiona36df11c.clearWarnings();
<== clearWarnings

==> XAResource7b5716db.start({XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54), data(00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e000000010000000000000000000000000002)}, 32768);
<== start

==> Connectiona36df11c.createStatement(1003, 1007);
<== Statement2eb0b291

==> Statement2eb0b291.executeUpdate("UPDATE MYTABLE SET STRVAL='XXXII' WHERE ID=32");
<== 1

==> Statement2eb0b291.close();
<== close

==> Statementa0c1b556.close();
<== close

==> XAResource7b5716db.end({XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54), data(00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e000000010000000000000000000000000002)}, 67108864);
<== end

==> XAResource5aadfc27.end({XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54), data(00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e000000010000000000000000000000000001)}, 67108864);
<== end

==> XAResource7b5716db.prepare({XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54), data(00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e000000010000000000000000000000000002)});
<== 3

The value of 3 above is XAResource.XA_RDONLY. Even though we wrote data in this transaction branch, the Microsoft JDBC Driver votes read only. Maybe the Microsoft driver has really done both of the updates under the same branch? In that case, it is probably fine, but the XA_RDONLY leads the transaction manager into believing that it should go ahead and commit the other transaction branch using the one-phase optimization -- because there is only one resource left and it hasn't prepared yet -- but the JDBC driver fails this with -6 (XAER_PROTO), which I suspect is the bug here:

==> XAResource5aadfc27.commit({XidImpl: formatId(57415344), gtrid_length(36), bqual_length(54), data(00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e000000010000000000000000000000000001)}, true);
<== javax.transaction.xa.XAException: The function COMMIT: failed. The status is: -6. Error: "*** SQL_XA DTC_ERROR Context: xa_commit_entry, StatusCode=-6 (0xFFFFFFFA)"
        at com.microsoft.sqlserver.jdbc.SQLServerXAResource.DTC_XA_Interface(SQLServerXAResource.java:662)
        at com.microsoft.sqlserver.jdbc.SQLServerXAResource.commit(SQLServerXAResource.java:807)
        ...

Reproduction code

I have created an automated test in OpenLiberty, named testTransactionBranchesTightlyCoupled, that reproduces the failure: https://github.com/OpenLiberty/open-liberty/pull/16350/commits/24e0d49caeb54b139b4a3f67d29cb45d94189b0d In order to merge it, I temporarily disabled it by issuing a rollback rather than a commit, but you can easily switch it back on by changing,

// TODO switch to commit once Microsoft bug is fixed
tran.rollback();

to

tran.commit();

Command to run the test (after building OpenLiberty & also requires docker),

./gradlew com.ibm.ws.jdbc_fat_sqlserver:buildandrun -Dfat.test.class.name=com.ibm.ws.jdbc.fat.sqlserver.SQLServerTest -Dfat.test.method.name=testTransactionBranchesTightlyCoupled

While we could in theory write a standalone test based on the above, it would not be using a real transaction manager, so running the test within OpenLiberty seems preferable.

lilgreenbird commented 3 years ago

hi @njr-11

Do you have a simple java repro that will trigger this error?

njr-11 commented 3 years ago

hi @njr-11

Do you have a simple java repro that will trigger this error?

@lilgreenbird I don't, because this requires a transaction manager to run against. I can certainly take the captured JDBC commands and create a simple reproduceable out of it, and it will demonstrate the JDBC driver bug. But without an actual transaction manager, I'll need to guess at code paths that might occur if the driver were to make it past the error. Here is an attempt at that:

package example;

import java.sql.Connection;
import java.sql.SQLException;
import java.sql.Statement;

import javax.sql.XAConnection;
import javax.transaction.xa.XAResource;
import javax.transaction.xa.Xid;

import com.microsoft.sqlserver.jdbc.SQLServerXADataSource;
import com.microsoft.sqlserver.jdbc.SQLServerXAResource;

public class TightBranchCouplingTest {
    public static void main(String[] args) throws Exception {
        if (args.length < 5) {
            System.out.println("usage: java example.TightBranchCouplingTest [serverName] [portNumber] [databaseName] [user] [password]");
            return;
        }

        String server = args[0];
        int port = Integer.parseInt(args[1]);
        String database = args[2];
        String user = args[3];
        String password = args[4];

        SQLServerXADataSource ds = new SQLServerXADataSource();
        ds.setServerName(server);
        ds.setPortNumber(port);
        ds.setDatabaseName(database);

        XAConnection XAConnection219c32a1 = ds.getXAConnection(user, password);
        Connection Connection5b41f93c = XAConnection219c32a1.getConnection();
        Connection5b41f93c.setTransactionIsolation(Connection.TRANSACTION_REPEATABLE_READ);

        XAConnection XAConnectionef960d1a = ds.getXAConnection(user, password);
        Connection Connectiona36df11c = XAConnectionef960d1a.getConnection();
        Connectiona36df11c.setTransactionIsolation(Connection.TRANSACTION_REPEATABLE_READ);

        // clean up after previous run of test
        Statement stmt = Connection5b41f93c.createStatement();
        try {
            stmt.execute("DROP TABLE MYTABLE");
        } catch (SQLException x) {
        }

        stmt.executeUpdate("CREATE TABLE MYTABLE (ID SMALLINT NOT NULL PRIMARY KEY, STRVAL NVARCHAR(40))");
        stmt.close();

        XAResource XAResource5aadfc27 = XAConnection219c32a1.getXAResource();
        Connection5b41f93c.getTransactionIsolation();
        Connection5b41f93c.getTransactionIsolation();
        Connection5b41f93c.clearWarnings();

        Xid xid1 = new MyXid(57415344,
                             "00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e",
                             "00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e000000010000000000000000000000000001");
        XAResource5aadfc27.start(xid1, SQLServerXAResource.SSTRANSTIGHTLYCPLD /* 32768 */);

        Statement Statementa0c1b556 = Connection5b41f93c.createStatement(1003, 1007);
        Statementa0c1b556.executeUpdate("INSERT INTO MYTABLE VALUES (32, 'thirty-two')");

        XAResource XAResource7b5716db = XAConnectionef960d1a.getXAResource();
        Connectiona36df11c.getTransactionIsolation();
        Connectiona36df11c.getTransactionIsolation();
        Connectiona36df11c.clearWarnings();

        Xid xid2 = new MyXid(57415344,
                             "00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e",
                             "00000178661629ec000000011858a7b8ff430098135991e26fa8884c901b8a930d74016e000000010000000000000000000000000002");
        XAResource7b5716db.start(xid2, SQLServerXAResource.SSTRANSTIGHTLYCPLD /* 32768 */);

        Statement Statement2eb0b291 = Connectiona36df11c.createStatement(1003, 1007);
        Statement2eb0b291.executeUpdate("UPDATE MYTABLE SET STRVAL='XXXII' WHERE ID=32");

        Statement2eb0b291.close();
        Statementa0c1b556.close();

        XAResource7b5716db.end(xid2, XAResource.TMSUCCESS /* 67108864 */);
        XAResource5aadfc27.end(xid1, XAResource.TMSUCCESS /* 67108864 */);

        int vote2 = XAResource7b5716db.prepare(xid2);
        if (vote2 == XAResource.XA_OK /* 0 */) {
            // This path is not reached with current behavior of the JDBC driver,
            // so I'm guessing what a transaction manager might do here,
            int vote1 = XAResource5aadfc27.prepare(xid1);
            if (vote1 == XAResource.XA_OK)
                XAResource5aadfc27.commit(xid1, false);
            XAResource7b5716db.commit(xid2, false);
        } else if (vote2 == XAResource.XA_RDONLY /* 3 */) {
            // JDBC driver fails here:
            XAResource5aadfc27.commit(xid1, true /* one-phase optimization */);
        }
    }
}

class MyXid implements Xid {
    byte branchQualifier[];
    int formatId;
    byte globalTranId[];

    public MyXid(int formatId, String gtrid, String bqual) {
        this.formatId = formatId;

        globalTranId = new byte[gtrid.length() / 2];
        for (int i = 0; i < gtrid.length() / 2; i++)
            globalTranId[i] = (byte) Integer.parseInt(gtrid.substring(i * 2, i * 2 + 2), 16);

        branchQualifier = new byte[bqual.length() / 2];
        for (int i = 0; i < bqual.length() / 2; i++)
            branchQualifier[i] = (byte) Integer.parseInt(bqual.substring(i * 2, i * 2 + 2), 16);
    }

    @Override
    public byte[] getBranchQualifier() {
        return branchQualifier;
    }

    @Override
    public int getFormatId() {
        return formatId;
    }

    @Override
    public byte[] getGlobalTransactionId() {
        return globalTranId;
    }
}
lilgreenbird commented 3 years ago

hi @njr-11

Thanks, we will investigate, apologies this might take a bit of time as we are short staff right now.

Can you please clarify to help with our investigation. The driver has code internally bundled in the form of a DLL to handle the XA transactions, but starting with SQL Server 2017 with CU16 and in SQL Server 2019, the functionality is built into the server so the driver DLL is no longer necessary it's best to use the builtin components on the server. Can you please tell us if you are using the driver's sqljdbc_xa.dll or if you just used the stored procedure sp_sqljdbc_xa_install to enable the XA components on the server? (see https://docs.microsoft.com/sql/connect/jdbc/understanding-xa-transactions#configuration-instructions)

Also, you mentioned that you have things working with other JDBC drivers do you mean that everything works using the exact same code you have above? If so, may I ask which JDBC drivers you were using?

njr-11 commented 3 years ago

@lilgreenbird we are running this against SQL Server version 15.00.4013 which used the sp_sqljdbc_xa_install stored procedure for XA enablement (not the dll from the JDBC driver).

That's a great question about the other JDBC drivers. I just looked in the logs for both Oracle and DB2 for when we run the tightly coupled transaction branches test against them and they both follow the pattern of returning XA_RDONLY from prepare and then allowing the xa.commit with one-phase optimization. So the pattern for those drivers is:

xa.start branch 1 xa.start branch 2 xa.end branch 2 xa.end branch 1 xa.prepare branch 2 : returns XA_RDONLY (3) xa.commit(one-phase optimization) branch 1

So it's the same as what the Microsoft SQL Server JDBC driver is doing, except that the Microsoft JDBC driver gets the -6 error when attempting the final xa.commit with one-phase optimization on the first transaction branch.

lilgreenbird commented 3 years ago

hi @njr-11

When you said that you have it working for a couple of other JDBC drivers I thought you meant with SQL Server didn't realize it's with Oracle and DB2. So actually this had never worked on SQL Server for you?

Distributed transactions are handled by the SQL Server. Prior to SQL Server 2017 CU6 it used the Microsoft Distributed Transaction Coordinator (MDTC) and the driver used the msqlsrvxa1 library interface to communicate to the DTC. Starting with SQL Server 2017 CU6 and SQL Server 2019 this functionality is built into the server which can be enabled with sp_sqljdbc_xa_install as you have done. You can see here the driver merely calls the XA functions and returns the result the driver doesn't do any interpretation of the result it just returns the return value from the call . So in this case looks like the server is not returning the result as expected. Unfortunately our team is only responsible for the driver code in this repo any issues with the builtin XA distributed transaction would have to be filed with SQL Server.

njr-11 commented 3 years ago

@lilgreenbird given what you said it makes sense for this to be a bug in the Microsoft SQL Server database rather than the JDBC driver.

Could you get this bug reported to the database team (I can see from your profile that you work for Microsoft) to give them the opportunity to fix it? And then yes after that I can close out this issue in the JDBC driver repo because it doesn't make sense to track it here given that. Thanks for investigating!

lilgreenbird commented 3 years ago

hi @njr-11

Unfortunately our team is only responsible for code in this repo. For other MS products you will have to go through MS support, who will then file and RFC with the appropriate team.

njr-11 commented 3 years ago

@lilgreenbird this bug was discovered during development of open source software aiming to integrate with a variety of databases/drivers. Many users of this open source software will choose to run with Microsoft SQL Server in production under support contracts, however, I am not a user of Microsoft SQL Server in production myself. This bug is reported out of courtesy to give Microsoft a chance to fix it in advance before real customers do run into it. I'm not suggesting that you fix it yourself. I'm suggesting that you should notify the Microsoft SQL Server database development team to give them the opportunity to fix it.

lilgreenbird commented 3 years ago

hi @njr-11 I understand, but Microsoft is a big company in order to report issues one has to go through support who will then route you to the appropriate team to handle. We have to do the same when we find issues we do not have a direct line to get to the SQL Server team.

vwarchu commented 3 years ago

Both points well taken. I can reach out to MS Support and determine suggested approaches to file an issue that doesn't exactly have an owner.

David-Engel commented 3 years ago

@njr-11 Thanks for reporting the bug. We are going to forward the issue on to the SQL engine team, but before we do, can you confirm that it still exists against the latest SQL 2019 CU? Version 15.00.4013 is CU2 from Feb 2020. Can you try against the latest, CU10, released this month, just to make sure it hasn't already been fixed?

Thanks, David

njr-11 commented 3 years ago

@David-Engel good idea. I moved up to the latest versions, Database product version : 15.00.4123 JDBC driver version : 9.2.1.0 and confirmed that the same error is still occurring.