vapor / postgres-nio

🐘 Non-blocking, event-driven Swift client for PostgreSQL.
https://api.vapor.codes/postgresnio/documentation/postgresnio/
MIT License
304 stars 70 forks source link

Constantly retrying/never failing request to unreachable database #489

Open abidon opened 2 weeks ago

abidon commented 2 weeks ago

Describe the issue

When using PostgresClient, making requests to a IP:port that isn't serving a postgres database will never throw an error and retry forever

Vapor version

Not using vapor, but PostgresKit is v2.13.5

Operating system and version

macOS 15.0 beta 1

Swift version

Swift 6 (from Xcode 16 beta 1)

Steps to reproduce

  1. Create an empty Swift package
  2. Add PostgresKit as a dependency to the package and the executableTarget
  3. Write the following code in the main.swift file
import PostgresKit

let config = PostgresClient.Configuration(
    host: "localhost",
    port: 5432,
    username: "my_username",
    password: "my_password",
    database: "my_database",
    tls: .disable
)

let client = PostgresClient(configuration: config)

try await withThrowingTaskGroup(of: Void.self) { taskGroup in
    taskGroup.addTask {
        await client.run() // !important
    }

    try await client.withConnection { connection in
        let db = connection.sql()

        let row = try await db.raw("SELECT 1 as healthy").first()

        print(try row?.decode(column: "healthy", as: Int.self))
    }
}
  1. Run the project, make sure there is no Postgres server running and replying to localhost:5432

Outcome

The db.raw call never returns, and the following log can be seen repeatedly in the console:

nw_endpoint_flow_failed_with_error [C37.1.2 127.0.0.1:5432 in_progress socket-flow (satisfied (Path is satisfied), viable, interface: lo0)] already failing, returning
nw_endpoint_flow_failed_with_error [C37.1.2 127.0.0.1:5432 cancelled socket-flow ((null))] already failing, returning

The expected behaviour would be that after a given configurable number of retries, the method would throw an error. As of now, there seem to be no way to detect a failing connection to the database.

Additional notes

My computer is an M1 Ultra. I run the latest macOS and Xcode beta and sadly I can't test with a stable version of macOS.

MahdiBM commented 2 weeks ago

Probably related: https://github.com/vapor/postgres-nio/pull/487

abidon commented 2 weeks ago

@MahdiBM I've tried to pin postgres-nio to the merge commit of the PR mentioned above, and can confirm the issue is still here.

Here is the Package.swift file (the main.swift above remain unchanged):

import PackageDescription

let package = Package(
    name: "PostgresClientRepro",
    platforms: [.macOS(.v13)],
    dependencies: [
        .package(url: "https://github.com/vapor/postgres-kit.git", .upToNextMajor(from: "2.13.5")),
        .package(url: "https://github.com/vapor/postgres-nio.git", branch: "f55caa7"),
    ],
    targets: [
        .executableTarget(
            name: "PostgresClientRepro",
            dependencies: [
                .product(name: "PostgresKit", package: "postgres-kit"),
                .product(name: "PostgresNIO", package: "postgres-nio"),
            ]
        ),
    ]
)

While writing this comment the query still hasn't returned (430+ seconds).

MahdiBM commented 2 weeks ago

branch: "f55caa7"

Does it even work with that?!

Can you try "main" instead now that the PR is merged?

MahdiBM commented 2 weeks ago

You could use revision: "SHA_HERE" but I hadn't seen using branch: with a SHA ... not sure if it's just me not knowing branch also accepts SHAs, or it's incorrect.

fabianfett commented 2 weeks ago

@abidon So there is a couple of things to unpack here:

  1. We don't enforce any timeouts in PostgresClient and we consider this a feature for PostgresNIO right now. Instead we hope that Swift introduces a timeout API like the one below. You can enforce timeouts yourself here:
func timeout<Clock: _Concurrency.Clock, Instant: InstantProtocol, Success: Sendable>(
    deadline: Instant,
    clock: Clock,
    _ closure: @escaping @Sendable () async throws -> Success
) async throws -> Success where Clock.Instant == Instant {

    let result = await withTaskGroup(of: TimeoutResult<Success>.self, returning: Result<Success, any Error>.self) { taskGroup in
        taskGroup.addTask {
            do {
                try await clock.sleep(until: deadline, tolerance: nil)
                return .deadlineHit
            } catch {
                return .deadlineCancelled
            }
        }

        taskGroup.addTask {
            do {
                let success = try await closure()
                return .workFinished(.success(success))
            } catch let error {
                return .workFinished(.failure(error))
            }
        }

        var r: Swift.Result<Success, any Error>?
        while let taskResult = await taskGroup.next() {
            switch taskResult {
            case .deadlineCancelled:
                continue // loop

            case .deadlineHit:
                taskGroup.cancelAll()

            case .workFinished(let result):
                taskGroup.cancelAll()
                r = result
            }
        }
        return r!
    }

    return try result.get()
}

Then you can use it like this:

let clock = ContiniousClock()
try await timeout(deadline: clock.now + .seconds(10), clock: clock) {
    let rows = try await client.query("SELECT *")
    for try await row in rows {

    }
}
  1. One issue in PostgresNIO that we currently have however is, that the above API approach would tell you that the request failed, since it got cancelled. So we should consider alternative error handling for timeouts.

Hope this helps.

abidon commented 2 weeks ago

Can you try "main" instead now that the PR is merged?

@MahdiBM: It was indeed the last commit hash on main, the one of the PR merge. And yeah, I always used branch with both tags, branches or commit hashes.

You could use revision: "SHA_HERE"

In fact I wasn't aware of revision 😅 Looks like we both learnt one thing 😊
Semantically speaking, revision sounds better, the fact SHA work with branch might just be a happy circumstance.


@fabianfett Thanks for the explanation and the code excerpt!

That being said, I find it slightly weird to overcome this specific problem with a timeout. The connection is not hanging, it's not even established.

Having to wait – let's say – 5 seconds even if we know that the connection cannot be made after a couple of milliseconds is just putting unnecessary pressure on the backend with long-awaiting requests.

On the other end, putting a deliberately small timeout to detect failing connections faster could cause false positives with timeouts occurring on a properly established connection.

Question: In addition to user-implemented timeouts (thanks to the code excerpt you provided), would it be reasonable to have something like a configurable maxAttempts: UInt? on PostgresClient.Configuration (which would default to nil to not break the actual behaviour of infinitely retrying) ?

From my understanding of the code, this maxAttempts could then be used here.

If you think this could be a good solution, I could attempt a PR. Any additional guidance and/or API design/naming would be appreciated.