open-telemetry / opentelemetry-swift

OpenTelemetry API for Swift
https://opentelemetry.io/docs/instrumentation/swift/
Apache License 2.0
207 stars 124 forks source link

Crash in the URLSessionInstrumentation when it's used with Alamofire #530

Closed ypopovych closed 3 months ago

ypopovych commented 4 months ago

Hello,

I found a crash in the URLSessionInstrumentation when it is used with Alamofire.

It happens in this line https://github.com/open-telemetry/opentelemetry-swift/blob/34c45693bb4520f33709f400df81e3d929d8f967/Sources/Instrumentation/URLSession/URLSessionInstrumentation.swift#L611 It crashes with Cannot set task delegate after resumption exception.

I prepared a test repository, where it can be replicated: https://github.com/ypopovych/Alamofire-OpenTelemetry I added URLSessionInstrumentation to their iOS test target.

Crash happens in the DataStreamConcurrencyTests testThatDataStreamTaskCanBeCancelledAfterStreamTurnsOffAutomaticCancellation test.

To run Alamofire tests you will need their firewalk test server to be installed and running. You can install it with brew install alamofire/alamofire/firewalk command and run it from the command line.

atreat commented 4 months ago

Did some investigation on this. It looks like it may be caused by this code where Alamofire is calling resume just before a request is cancelled. I'm not sure why this is unique to the concurrent stream task case. https://github.com/ypopovych/Alamofire-OpenTelemetry/blob/4bfccf2e96083e2a47c6f3b554699488368f80d7/Source/Core/Request.swift#L683-L684

// In Request.cancel()

task.resume()
task.cancel()

Nevertheless, it should be possible to guard against this by expanding the logic where the FakeDelegate is set to first check the state of the task.

if task.state != .running && task.delegate == nil {
    task.delegate = FakeDelegate()
}

I need to do some more testing to make sure that's the most correct check in this scenario

atreat commented 4 months ago

After looking at this a bit more, I see some interesting behavior with this guard statement:

guard Task.basePriority != nil else {
    // If not inside a Task basePriority is nil
    return
}

In this test, this urlSessionTaskWillResume is called twice. Once when the task is initially resumed, then again in Request.cancel() as I mention above.

The first time through this guard statement blocks execution and the return happens. This means the FakeDelegate is not properly set, and the task state moves to running.

The second time through this guard statement does not block and execution continues. This means the task.delegate is nil and its state has been moved to running.

@nachoBonafonte Was wondering your thoughts if we moved the setting of task.delegate above the Task.basePriority guard statement?

nachoBonafonte commented 3 months ago

Yes I think checking task.state when assigning the delegate is the way to go, as it is only needed for the async case and the only way to check its the async is when Task.basePriority != nil

atreat commented 3 months ago

@ypopovych this PR with a fix has been merged. Let us know if you see any other issues

ypopovych commented 3 months ago

@atreat thanks! Will check it in our test env

ypopovych commented 3 months ago

checked in our test env - fixed! Works fine. Thank you!

When can we expect release with this fix?