vapor / postgres-nio

🐘 Non-blocking, event-driven Swift client for PostgreSQL.
https://api.vapor.codes/postgresnio/documentation/postgresnio/
MIT License
317 stars 72 forks source link

Actually fix #347 #353

Closed fabianfett closed 1 year ago

fabianfett commented 1 year ago

In #351 we only landed a simple bandaid that didn't fix the underlying issue. Let's fix the real issue inside the state machine now.

cc @trasch can you very this please?

Fixes #347

codecov-commenter commented 1 year ago

Codecov Report

Merging #353 (32aeeae) into main (1516e0c) will increase coverage by 0.63%. The diff coverage is 80.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #353 +/- ## ========================================== + Coverage 41.20% 41.83% +0.63% ========================================== Files 117 117 Lines 9657 9668 +11 ========================================== + Hits 3979 4045 +66 + Misses 5678 5623 -55 ``` | [Impacted Files](https://codecov.io/gh/vapor/postgres-nio/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vapor) | Coverage Δ | | |---|---|---| | [...ces/PostgresNIO/Deprecated/PostgresData+UInt.swift](https://codecov.io/gh/vapor/postgres-nio/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vapor#diff-U291cmNlcy9Qb3N0Z3Jlc05JTy9EZXByZWNhdGVkL1Bvc3RncmVzRGF0YStVSW50LnN3aWZ0) | `0.00% <ø> (ø)` | | | [...urces/PostgresNIO/New/PostgresChannelHandler.swift](https://codecov.io/gh/vapor/postgres-nio/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vapor#diff-U291cmNlcy9Qb3N0Z3Jlc05JTy9OZXcvUG9zdGdyZXNDaGFubmVsSGFuZGxlci5zd2lmdA==) | `62.20% <ø> (+0.13%)` | :arrow_up: | | [...nection State Machine/ConnectionStateMachine.swift](https://codecov.io/gh/vapor/postgres-nio/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vapor#diff-U291cmNlcy9Qb3N0Z3Jlc05JTy9OZXcvQ29ubmVjdGlvbiBTdGF0ZSBNYWNoaW5lL0Nvbm5lY3Rpb25TdGF0ZU1hY2hpbmUuc3dpZnQ=) | `59.21% <77.77%> (+1.57%)` | :arrow_up: | | [...tion State Machine/ExtendedQueryStateMachine.swift](https://codecov.io/gh/vapor/postgres-nio/pull/353?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vapor#diff-U291cmNlcy9Qb3N0Z3Jlc05JTy9OZXcvQ29ubmVjdGlvbiBTdGF0ZSBNYWNoaW5lL0V4dGVuZGVkUXVlcnlTdGF0ZU1hY2hpbmUuc3dpZnQ=) | `77.28% <83.33%> (+8.45%)` | :arrow_up: | ... and [1 file with indirect coverage changes](https://codecov.io/gh/vapor/postgres-nio/pull/353/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=vapor)
trasch commented 1 year ago

@fabianfett No crash so far. 👍

But cancelled queries now return PSQLError(backing: PostgresNIO.PSQLError.(unknown context at $11882746c).Backing)which is not what I would have expected... Is this maybe some unwanted side effect of the change?

fabianfett commented 1 year ago

@trasch What did you get before?

trasch commented 1 year ago

What did you get before?

@fabianfett Nothing because it crashed before it could return an error...

But my question was stupid anyway, I didn't properly look at the PSQLError struct. What I should have asked is: the error code is now .server, but there is also .queryCancelled which is the code that I would have expected if the query gets cancelled by the server. This information is also contained in serverInfo?[.sqlState] with the not so helpful number 57014 which I can of course convert to a PostgresError.Code.queryCanceled (with one l...), but I'm not sure if this is how it should be? Or maybe do I misunderstand how PSQLError is intended to be used?

gwynne commented 1 year ago

It would be nice if the PSQLError.Backing had an even infinitesimally useful description 🙂

trasch commented 1 year ago

@fabianfett I've changed the test case for this issue a bit to show two problems that I still have:

  1. Cancelling a task correctly throws a CancellationError - but the query is still running on the connection which blocks. Not sure if this isn't intentional though, I know that PostgreSQL can be somewhat finicky when clients want to kill queries...
  2. It crashes when you cancel the Task and the server kills the query at the same time - just uncomment the line that sets the statement timeout
    func testCancelTaskWhileLongRunningQuery() async throws {
        let eventLoopGroup = MultiThreadedEventLoopGroup(numberOfThreads: 1)
        defer { XCTAssertNoThrow(try eventLoopGroup.syncShutdownGracefully()) }
        let eventLoop = eventLoopGroup.next()

        let start = 1
        let end = 1000000

        let connectionClosesAfterCancel = self.expectation(description: "connection closes after cancel")

        Task {
            try await withTestConnection(on: eventLoop) { connection -> () in
                // Uncommenting this will also crash:
//                try await connection.query("SET statement_timeout=1000;", logger: .psqlTest)

                let longRunningQuery = Task {
                    let rows = try await connection.query("SELECT generate_series(\(start), \(end));", logger: .psqlTest)
                    var counter = 0
                    do {
                        for try await element in rows.decode(Int.self, context: .default) {
                            XCTAssertEqual(element, counter + 1)
                            counter += 1
                        }
                        XCTFail("Expected to get cancelled while reading the query")
                    } catch is CancellationError {
                        // Expected
                    } catch {
                        XCTFail("Unexpected error: \(error)")
                    }

                    XCTAssertFalse(connection.isClosed, "Connection should survive!")
                    print("longRunningQuery done")
                }

                Task {
                    let delay = UInt64(0.2 * 1_000_000_000)
                    try await Task<Never, Never>.sleep(nanoseconds: delay)

                    print("Cancel!")
                    longRunningQuery.cancel()
                }

                try await longRunningQuery.value
            }

            connectionClosesAfterCancel.fulfill()
        }

        await fulfillment(of: [connectionClosesAfterCancel], timeout: 1.0)
    }
fabianfett commented 1 year ago

@trasch Can you give this another spin?!

trasch commented 1 year ago

Can you give this another spin?!

@fabianfett Awesome, short test worked, I will do a longer test run tomorrow

fabianfett commented 1 year ago

The CI failures here are caused by this issue: https://github.com/apple/swift-nio/pull/2415