lovetodream / oracle-nio

Non-blocking, event-driven Swift client for Oracle DBs. Does not require OCI or ODPI.
https://swiftpackageindex.com/lovetodream/oracle-nio/documentation/oraclenio
Apache License 2.0
29 stars 4 forks source link

Failed ping results in a crash #21

Closed iliasaz closed 4 months ago

iliasaz commented 5 months ago

Hi,

There seems to be an issue with handling a failed ping that leads to a crash. Here are the details from my app log. There was no activity on the connection apart from the ping pong.

[ DEBUG ] run ping pong [oraclesql_connection_id: 0] (OracleNIO/OracleClientMetrics.swift:52)
..... << a bit later>> .....
[ DEBUG ] Channel error caught. [oraclesql_connection_id: 0, oraclesql_error: uncleanShutdown] (OracleNIO/OracleChannelHandler.swift:97)
[ DEBUG ] Cleaning up and closing connection. [oraclesql_connection_id: 0, oraclesql_error: OracleSQLError(code: connectionError, underlying: NIOSSL.NIOSSLError.uncleanShutdown)] (OracleNIO/OracleChannelHandler.swift:669)
[ DEBUG ] Connection closed [oraclesql_connection_id: 0] (OracleNIO/OracleClientMetrics.swift:72)

*** Signal 4: Backtracing from 0x5620a8d339bb... done ***

*** Swift runtime failure: arithmetic overflow ***

Thread 3 "NIO-SGLTN-0-#2" crashed:

 0 [inlined] [system]      0x00005620a8d339bb Swift runtime failure: arithmetic overflow in App at /build/<compiler-generated>
 1                         0x00005620a8d339bb PoolStateMachine.ConnectionGroup.connectionClosed(_:) + 1915 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/PoolStateMachine+ConnectionGroup.swift:575:38
 2 [ra]                    0x00005620a8d58059 PoolStateMachine.connectionClosed(_:) + 1032 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/PoolStateMachine.swift:404:59
 3 [ra]                    0x00005620a8d21161 closure #1 in ConnectionPool.connectionDidClose(_:error:) + 176 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/ConnectionPool.swift:302:32
 4 [ra] [thunk]            0x00005620a8d25dc1 partial apply for closure #1 in ConnectionPool.connectionDidClose(_:error:) + 80 in App at /build/<compiler-generated>
 5 [ra] [thunk]            0x00005620a8d25487 partial apply for closure #1 in ConnectionPool.modifyStateAndRunActions(_:) + 22 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/ConnectionPool.swift:356:13
 6 [ra]                    0x00005620a8d2de4c closure #1 in LockStorage.withLockedValue<A>(_:) + 59 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/NIOLock.swift:176:24
 7 [ra] [thunk]            0x00005620a8d2de99 partial apply for closure #1 in LockStorage.withLockedValue<A>(_:) + 24 in App at /build/<compiler-generated>
 8 [ra]                    0x00005620a8f7dbd9 ManagedBuffer.withUnsafeMutablePointers<A>(_:) + 56 in App
 9 [ra]                    0x00005620a8d2de01 LockStorage.withLockPrimitive<A>(_:) + 48 in App
10 [ra]                    0x00005620a8d2ddc7 LockStorage.withLockedValue<A>(_:) + 22 in App
11 [ra]                    0x00005620a8d2e1e2 NIOLockedValueBox.withLockedValue<A>(_:) + 17 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/NIOLockedValueBox.swift:42:34
12 [ra]                    0x00005620a8d1edcf ConnectionPool.modifyStateAndRunActions(_:) + 606 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/ConnectionPool.swift:355:37
13 [ra]                    0x00005620a8d21096 ConnectionPool.connectionDidClose(_:error:) + 357 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/ConnectionPool.swift:301:14
14 [ra] [thunk]            0x00005620a8d2948c partial apply for closure #1 in closure #1 in closure #1 in ConnectionPool.makeConnection<A>(for:in:) + 75 in App at /build/.build/checkouts/postgres-nio/Sources/ConnectionPoolModule/ConnectionPool.swift:434:30
15 [ra] [thunk]            0x00005620a883a053 partial apply for closure #1 in OracleConnection.onClose(_:) + 18 in App at /build/.build/checkouts/oracle-nio/Sources/OracleNIO/Pool/OracleClient.swift:236:46
16 [ra]                    0x00005620a7d56196 specialized closure #1 in EventLoopFuture._publicWhenComplete(_:) + 85 in App
17 [ra]                    0x00005620a84bcfcd CallbackList._run() + 2796 in App at /build/.build/checkouts/swift-nio/Sources/NIOCore/EventLoopFuture.swift:89:27
18 [ra] [system]           0x00005620a7d810c5 specialized EventLoopPromise._resolve(value:) + 100 in App at /build/<compiler-generated>
19 [ra] [inlined] [system] 0x00005620a8640fa8 specialized EventLoopPromise.succeed(_:) in App at /build/<compiler-generated>
20 [ra]                    0x00005620a8640fa8 closure #4 in BaseSocketChannel.close0(error:mode:promise:) + 103 in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/BaseSocketChannel.swift:897:31
21 [ra] [inlined]          0x00005620a86a32a7 closure #1 in SelectableEventLoop.run(_:) in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:468:17
22 [ra] [inlined]          0x00005620a86a32a7 specialized withAutoReleasePool<A>(_:) in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:30:16
23 [ra] [inlined]          0x00005620a86a32a7 SelectableEventLoop.run(_:) in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:465:9
24 [ra]                    0x00005620a86a32a7 SelectableEventLoop.runLoop(selfIdentifier:) + 806 in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:628:22
25 [ra]                    0x00005620a86a1055 SelectableEventLoop.run() + 452 in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/SelectableEventLoop.swift:718:33
26 [ra] [inlined]          0x00005620a86782d6 static MultiThreadedEventLoopGroup.runTheLoop(thread:parentGroup:canEventLoopBeShutdownIndividually:selectorFactory:initializer:metricsDelegate:_:) in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/MultiThreadedEventLoopGroup.swift:95:22
27 [ra]                    0x00005620a86782d6 closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:parentGroup:selectorFactory:initializer:metricsDelegate:) + 341 in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/MultiThreadedEventLoopGroup.swift:114:41
28 [ra] [thunk]            0x00005620a867b715 partial apply for closure #1 in static MultiThreadedEventLoopGroup.setupThreadAndEventLoop(name:parentGroup:selectorFactory:initializer:metricsDelegate:) + 52 in App at /build/<compiler-generated>
29 [ra] [thunk]            0x00005620a867e1cf thunk for @escaping @callee_guaranteed (@guaranteed NIOThread) -> () + 14 in App at /build/<compiler-generated>
30 [ra]                    0x00005620a86bfa6d closure #1 in static ThreadOpsPosix.run(handle:args:detachThread:) + 380 in App at /build/.build/checkouts/swift-nio/Sources/NIOPosix/ThreadPosix.swift:116:13

Thanks!

lovetodream commented 5 months ago

Thanks for reporting this, can you provide a repro? And are using the latest version of oracle-nio (just to make sure 😀)

iliasaz commented 5 months ago

I don't have a solid repro at the moment. It'd been running fine for about 2 weeks before it crashed all of a sudden. I'll see if I can reproduce it in an isolated standalone setting. And yes - it was the latest version as of 2 weeks back.

lovetodream commented 4 months ago

Did you find a repro? If not, are you still facing the issue in your app?

iliasaz commented 4 months ago

Yes, I did create a reproducible test case. Here is it.

func loadTLSConfigurationFromWallet(walletPath: String, walletPassword: String) throws -> TLSConfiguration? {
    let privateKey = try NIOSSLPrivateKey(file: walletPath, format: .pem) { completion in
        completion(walletPassword.utf8)
    }

    // This was the missing piece in my config. It's needed to validate the full certificate path.
    let certificate = try NIOSSLCertificate(file: ProcessInfo.processInfo.environment["ORA_WALLET"] ?? "ewallet.pem", format: .pem)

    var tlsConfig = TLSConfiguration.makeClientConfiguration()
    tlsConfig.privateKey = NIOSSLPrivateKeySource.privateKey(privateKey)
    tlsConfig.certificateChain = [.certificate(certificate)]

    return tlsConfig
}

func createOracleClient() throws -> OracleClient? {
    var logger = Logger(label: "oracle-logger")
    logger.logLevel = .debug

    // get TLS config
    guard let tlsConfig = try loadTLSConfigurationFromWallet(
        walletPath: ProcessInfo.processInfo.environment["ORA_WALLET"] ?? "ewallet.pem",
        walletPassword: ProcessInfo.processInfo.environment["ORA_WALLET_PASSWORD"] ?? "password"
    ) else {
        logger.debug("got nil TLS Configuration")
        return nil
    }
    logger.debug("got TLS configuration")
    let config = try OracleConnection.Configuration(
        host: ProcessInfo.processInfo.environment["ORA_HOST"] ?? "localhost",
        port: ProcessInfo.processInfo.environment["ORA_PORT"].flatMap(Int.init(_:)) ?? 1521,
        service: .serviceName(ProcessInfo.processInfo.environment["ORA_SERVICE"] ?? "orcl"),
        username: ProcessInfo.processInfo.environment["ORA_USERNAME"] ?? "scott",
        password: ProcessInfo.processInfo.environment["ORA_PASSWORD"] ?? "tiger",
        tls: .require(.init(configuration: tlsConfig)) // important!
    )
    logger.debug("created config: \(config)")

    // setting pool options
    var options = OracleClient.Options()
    options.minimumConnections = 1
    options.maximumConnections = 10

    // create and return a client object that has a connection pool in it
    let client = OracleClient(configuration: config, options: options, backgroundLogger: logger)

    logger.debug("created OracleClient")
    return client
}

class Pool {
    var oraClient: OracleClient

    init() throws {
        do {
            if let oraClient = try createOracleClient() {
                self.oraClient = oraClient
                // start Oracle connection pool and the client task
                Task {
                    await self.oraClient.run()
                }
                print("started conn pool")
            } else {
                fatalError("Oracle Client not configured")
            }
        } catch {
            fatalError("Error configuring Oracle Client: \(error)")
        }
    }

    func testSQL() async throws {
        try await oraClient.withConnection { conn in
            print("got connection")
            let rows = try await conn.query("SELECT count(*) FROM user_objects")
            for try await row in rows {
                print(row)
            }
            print("connection is working")
        }
    }

    func runloop() async throws {
        while( true ) {
            let secondsWait: Int64 = 30
            try await Task.sleep(for: Duration(secondsComponent: secondsWait, attosecondsComponent: 0))
            print("mark \(secondsWait) wait")
        }
    }
}

let pool = try Pool()
try await pool.testSQL()
try await pool.runloop()

First set the env variables with the proper DB credentials, wallet, and etc. Then run the program. It will connect to the DB and execute a simple query. Then it just waits and marks 30 second intervals. You'll also see ping pong traffic in the log.

2024-05-28T14:24:28-0700 debug oracle-logger : [oranio_test_cmd] got TLS configuration
2024-05-28T14:24:28-0700 debug oracle-logger : [oranio_test_cmd] created config: Configuration(options: OracleNIO.OracleConnection.Configuration.Options(connectTimeout: NIOCore.TimeAmount(nanoseconds: 10000000000), tlsServerName: nil), host: "adb.us-ashburn-1.oraclecloud.com", port: 1522, tls: OracleNIO.OracleConnection.Configuration.TLS(base: OracleNIO.OracleConnection.Configuration.TLS.Base.require(NIOSSL.NIOSSLContext)), proxyUser: nil, authenticationMethod: (Function), service: OracleNIO.OracleServiceMethod.serviceName("ge045b813707f2d_omoprwd1m_tp.adb.oraclecloud.com"), mode: DEFAULT, disableOOB: false, debugJDWP: nil, customTimezone: nil, _connectionIDPrefix: "", connectionID: "", _programName: "oranio-test-cmd", _machineName: "mbp3ora.local", pid: 21900, _processUsername: "ilia", _terminalName: "unknown", purity: DEFAULT, serverType: nil, drcpEnabled: false, cclass: nil)
2024-05-28T14:24:28-0700 debug oracle-logger : [oranio_test_cmd] created OracleClient
started conn pool
2024-05-28T14:24:28-0700 debug oracle-logger : oraclesql_connection_id=0 [OracleNIO] Creating new connection
got connection
OracleRow(lookupTable: ["COUNT(*)": 0], data: OracleNIO.DataRow(columnCount: 1, bytes: ByteBuffer { readerIndex: 0, writerIndex: 3, readableBytes: 3, capacity: 4, storageCapacity: 4, slice: _ByteBufferSlice { 0..<4 }, storage: 0x0000600001ab4440 (4 bytes) }
readable bytes (max 1k): [ 02 c1 02 ]), columns: [OracleNIO.DescribeInfo.Column(name: "COUNT(*)", dataType: OracleNIO.OracleDataType(key: 2, number: OracleNIO.OracleDataTypeNumber(backing: OracleNIO.OracleDataTypeNumber.Backing.number), name: "DB_TYPE_NUMBER", oracleName: "NUMBER", _oracleType: Optional(NUMBER), defaultSize: 0, csfrm: 0, bufferSizeFactor: 22), dataTypeSize: 0, precision: 0, scale: 0, bufferSize: 22, nullsAllowed: true)])
connection is working
mark 30 wait
2024-05-28T14:24:59-0700 debug oracle-logger : oraclesql_connection_id=0 [OracleNIO] run ping pong

Now simulate 100% packet loss using Network Link Conditioner or any other tool. Here is what I see in the log after enabling the total packet loss. The first ping pong packet shows as a success although the network is already dysfunctional, which is a bug. Then the crash occurs.

mark 30 wait
2024-05-28T14:27:30-0700 debug oracle-logger : oraclesql_connection_id=0 [OracleNIO] run ping pong

nw_socket_handle_socket_event [C1.1.1:1] Socket SO_ERROR [60: Operation timed out]
nw_read_request_report [C1] Receive failed with error "Operation timed out"
nw_protocol_socket_reset_linger [C1.1.1:1] setsockopt SO_LINGER failed [22: Invalid argument]
2024-05-28T14:28:04-0700 debug oracle-logger : oraclesql_connection_id=0 oraclesql_error=POSIXErrorCode(rawValue: 60): Operation timed out [OracleNIO] Channel error caught.
2024-05-28T14:28:04-0700 debug oracle-logger : oraclesql_connection_id=0 oraclesql_error=OracleSQLError(code: connectionError, underlying: POSIXErrorCode(rawValue: 60): Operation timed out) [OracleNIO] Cleaning up and closing connection.
nw_flow_service_writes [C1 147.154.27.228:1522 failed parent-flow (satisfied (Path is satisfied), interface: en0[802.11], ipv4, dns, uses wifi)] No output handler
2024-05-28T14:28:04-0700 debug oracle-logger : oraclesql_connection_id=0 oraclesql_error=POSIXErrorCode(rawValue: 60): Operation timed out [OracleNIO] Channel error caught.
2024-05-28T14:28:04-0700 debug oracle-logger : oraclesql_connection_id=0 oraclesql_error=uncleanShutdown [OracleNIO] Channel error caught.
2024-05-28T14:28:04-0700 debug oracle-logger : oraclesql_connection_id=0 [OracleNIO] Connection closed

Here is Xcode debug screenshot at the point of the crash.

Screenshot 2024-05-28 at 2 29 48 PM

Hope this is helpful.

Thanks!

lovetodream commented 4 months ago

Thanks for the repro, it's extremely helpful. I've identified the cause, there's some testing to do and then I'll release a fix in the upcoming days.

lovetodream commented 4 months ago

The issue should be fixed with the latest release. Thanks again for reporting and taking the time to find a repro.