trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.53k stars 3.03k forks source link

Query failure when multiple Statements form single JDBC Connection used at the same time (or within explicit transaction): Query already begun #10788

Open findepi opened 2 years ago

findepi commented 2 years ago

steps to reproduce 1

  1. create a JDBC connection (TrinoConnection)
  2. in threads
    1. create two statements (TrinoStatement)
    2. execute non-SELECT queries over Hive catalog

steps to reproduce 2

https://github.com/trinodb/trino/issues/10788#issuecomment-1311544594 (current version, as of 2022-11-14 15:06 CET)

observed

for example: https://github.com/trinodb/trino/blob/8e9da982f95a531b52a72c9cda39d96b00d126ce/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/SemiTransactionalHiveMetastore.java#L1267-L1271

notes

This is because io.trino.jdbc.TrinoConnection#transactionId is a shared mutable state. It's populated for non-SELECT queries here https://github.com/trinodb/trino/blob/a40038879399a6c4aedfb97fe9e5cd2bf1d05e21/client/trino-jdbc/src/main/java/io/trino/jdbc/TrinoStatement.java#L275

Thus, the second statement, may see the implicit, autocommit transaction created for the first statement.

findepi commented 2 years ago

I think JDBC requires Connection to be thread-safe, but couldn't find this in writing.

java.sql.Connection#abort mentons threads but that's quite specific method

The "JDBC™ 4.1 Specification" says

Each Connection object can create multiple Statement objects that may be used concurrently by the program. This is demonstrated in CODE EXAMPLE 13-2.

however it's followed by a code example that has no threads.

Internet wisdom (https://stackoverflow.com/questions/1531073/is-java-sql-connection-thread-safe#comment1386784_1531103) suggests some other databases have problems with thread-safety in JDBC.

TrinoConnection makes heavy use of atomic, trying to provide the thread-safety.

findepi commented 2 years ago

cc @aalbu @leniartek

electrum commented 2 years ago

I don't think the word "concurrently" in the specification there means "at the same time by multiple threads", since the specification doesn't mention threads anywhere else. Other drives like Oracle will synchronize access to the connection.

findepi commented 2 years ago

ince the specification doesn't mention threads anywhere else.

this i noticed, hence comment above.

Other drives like Oracle will synchronize access to the connection.

is this in support of what actually?

that the Connection should be thread-safe?

Current implementation of Connection looks thread-safe, and this dates back to the first commit introducing the class. It could be interpreted as if it was OK to share Connection between threads, but that leads to the problem described here. If the Connection is not supposed to be thread-safe, can we remove would-be-unnecessary use of AtomicReferences and ConcurrentHashMaps in the TrinoConnection?

ElapsedSoul commented 2 years ago

It seems get the same case in python client.

      File "/tmp/dolphinscheduler/exec/process/82/1442/6451842/18620689/py_1442_6451842_18620689.command", line 70, in Conca_X
        X=cur.fetchall()
      File "/opt/applications/conda/lib/python3.9/site-packages/trino/dbapi.py", line 558, in fetchall
        return list(self.genall())
      File "/opt/applications/conda/lib/python3.9/site-packages/trino/client.py", line 509, in __iter__
        rows = self._query.fetch()
      File "/opt/applications/conda/lib/python3.9/site-packages/trino/client.py", line 677, in fetch
        status = self._request.process(response)
      File "/opt/applications/conda/lib/python3.9/site-packages/trino/client.py", line 440, in process
        raise self._process_error(response["error"], response.get("id"))
    trino.exceptions.TrinoQueryError: TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Query already begun: Optional[20220712_234817_33615_a6hf9] while starting query 20220712_234819_33617_a6hf9", query_id=20220712_234819_33617_a6hf9)
findepi commented 2 years ago

@ElapsedSoul since this is client-specific problem, please file a separate bug report at https://github.com/trinodb/trino-python-client

toreluntang commented 2 years ago

I get the same error using Java with the JDBC Driver version 402. It happens fairly consistently when executing the follow code:

import java.sql.*;

public class main {
    public static void main(String[] args) throws SQLException {
        // loop to make it easier to see the error 
        for (int i = 0; i < 10; i++) {
            System.out.println("attempts #" + i);
            connectToTrinoHiveDocker();
        }
    }

    private static void connectToTrinoHiveDocker() throws SQLException {
        // Running Trino locally with hive as described here: https://github.com/bitsondatadev/trino-getting-started/tree/main/hive/trino-minio
        String url = "jdbc:trino://localhost:8443/minio?SSL=true&SSLVerification=NONE";
        Connection conn = DriverManager.getConnection(url, "testuser", "testuser");
        // autoCommit=True works, but false (which i need) causes failure
        conn.setAutoCommit(false);

        // This seems to execute a query that causes the follow preparedstmt to error with "query already begun"
        ResultSet result = conn.getMetaData().getSchemas();

        PreparedStatement stmt = conn.prepareStatement("select * from minio.tiny.customer limit 2");
        ResultSet rs = stmt.executeQuery();

        conn.close();
    }
}

Are we not expected to use autoCommit=false and executing several select statements? The same code works when connecting to postgres, mysql and oracle (not using trino, since the Hive connector on trino is the only that allows autoCommit=false)

findepi commented 2 years ago

@toreluntang can you check whether this is still reproducible when you wrap ResultSet result = conn.getMetaData().getSchemas() with try-with-resources?

toreluntang commented 2 years ago

Sorry if I misunderstood, feel free to correct me but: If i alter the example above and use this instead:

        try (ResultSet result = conn.getMetaData().getSchemas()) {
            System.out.println(result);
        } catch (Exception e) {
            e.printStackTrace();
        }

Another error shows up when i create the prepared statement for select * from ... Exception in thread "main" java.sql.SQLException: Query failed (#20221113_071435_00199_tks2h): Current transaction is aborted, commands ignored until end of transaction block image

findepi commented 2 years ago

Current transaction is aborted, commands ignored until end of transaction block

to fix this one, you'd need to consume the result set fully

try (ResultSet result = conn.getMetaData().getSchemas()) {
  while (result.next()) { }
}

not sure whether this should be this way though ...