shareup / phoenix-apple

Phoenix channels client used in our https://new.space/app iOS app
3 stars 0 forks source link

PhoenixTests crash with internal URLSession error #10

Open atdrendel opened 4 years ago

atdrendel commented 4 years ago
Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46

Unsolved thread on Developer Forums

2020-06-04T22:25:54.0091290Z $$ creating join timer seconds(10) 1
2020-06-04T22:25:54.0091380Z socket input success(Phoenix.WebSocket.Message.string("[19,19,\"room:lobby\",\"phx_reply\",{\"response\":{},\"status\":\"ok\"}]"))
2020-06-04T22:25:54.0091540Z channel input channelMessage(Phoenix.IncomingMessage(joinRef: Optional(Phoenix.Ref(rawValue: 19)), ref: Optional(Phoenix.Ref(rawValue: 19)), topic: "room:lobby", event: Phoenix.PhxEvent.reply, payload: ["status": ok, "response": {
2020-06-04T22:25:54.0091680Z }]))
2020-06-04T22:25:54.0091830Z socket sending OutgoingMessage(joinRef: Optional(Phoenix.Ref(rawValue: 19)), ref: Phoenix.Ref(rawValue: 20), topic: "room:lobby", event: Phoenix.PhxEvent.custom("repeat"), payload: ["amount": 5, "echo": "This should be repeated"], sentAt: Dispatch.DispatchTime(rawValue: 519325202881))
2020-06-04T22:25:54.0091970Z socket sending data 75 bytes
2020-06-04T22:25:54.0092100Z socket input success(Phoenix.WebSocket.Message.string("[null,null,\"room:lobby\",\"repeated\",{\"echo\":\"This should be repeated\",\"n\":1}]"))
2020-06-04T22:25:54.0092260Z channel input channelMessage(Phoenix.IncomingMessage(joinRef: nil, ref: nil, topic: "room:lobby", event: Phoenix.PhxEvent.custom("repeated"), payload: ["echo": This should be repeated, "n": 1]))
2020-06-04T22:25:54.0092460Z socket input success(Phoenix.WebSocket.Message.string("[null,null,\"room:lobby\",\"repeated\",{\"echo\":\"This should be repeated\",\"n\":2}]"))
2020-06-04T22:25:54.0093190Z channel input channelMessage(Phoenix.IncomingMessage(joinRef: nil, ref: nil, topic: "room:lobby", event: Phoenix.PhxEvent.custom("repeated"), payload: ["n": 2, "echo": This should be repeated]))
2020-06-04T22:25:54.0093470Z socket input success(Phoenix.WebSocket.Message.string("[null,null,\"room:lobby\",\"repeated\",{\"echo\":\"This should be repeated\",\"n\":3}]"))
2020-06-04T22:25:54.0093710Z channel input channelMessage(Phoenix.IncomingMessage(joinRef: nil, ref: nil, topic: "room:lobby", event: Phoenix.PhxEvent.custom("repeated"), payload: ["n": 3, "echo": This should be repeated]))
2020-06-04T22:25:54.0093960Z socket input success(Phoenix.WebSocket.Message.string("[null,null,\"room:lobby\",\"repeated\",{\"echo\":\"This should be repeated\",\"n\":4}]"))
2020-06-04T22:25:54.0094190Z channel input channelMessage(Phoenix.IncomingMessage(joinRef: nil, ref: nil, topic: "room:lobby", event: Phoenix.PhxEvent.custom("repeated"), payload: ["echo": This should be repeated, "n": 4]))
2020-06-04T22:25:54.0095500Z Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
2020-06-04T22:25:54.0096290Z Test Case '-[PhoenixTests.ChannelTests testReceiveMessages]' passed (0.015 seconds).
2020-06-04T22:25:54.0096960Z Test Case '-[PhoenixTests.ChannelTests testRejoinsAfterDisconnect]' started.
2020-06-04T22:25:54.0097200Z socket input success(Phoenix.WebSocket.Message.string("[null,null,\"room:lobby\",\"repeExited with signal code 4
2020-06-04T22:25:59.1240250Z ##[error]Process completed with exit code 1.
atdrendel commented 4 years ago

This has happened locally on my MacBook Pro and when using GitHub Actions.

myobie commented 4 years ago

This is the call-site where the error output happens: https://github.com/apple/swift/blob/164ebbf0e43916285a9cb19a7d68253fb7ba57b6/stdlib/public/Darwin/Foundation/URLSession.swift#L46

atdrendel commented 4 years ago

This reproduced on my MacBook Pro just now. I gathered all system state and will try to catch this tomorrow.

URLSessionError.txt

atdrendel commented 4 years ago

I haven't seen this one in a while. I'll keep it open, however, because it's always been rare.

atdrendel commented 4 years ago

This happened a few times yesterday and today. Most recently, it happened with SocketTests.testPushOntoSocket

atdrendel commented 4 years ago

It happened today. I wonder if I misread the previous crash log. Maybe it happens in SocketTests.testRemoteClosePublishesClose(), instead. At least, that's what it looks like from today's test log:

2020-06-27T14:58:06.3595270Z Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' started.
2020-06-27T14:58:06.3649020Z [info] CONNECTED TO ServerWeb.Socket in 172µs
2020-06-27T14:58:06.3658760Z   Transport: :websocket
2020-06-27T14:58:06.3658920Z   Serializer: Phoenix.Socket.V2.JSONSerializer
2020-06-27T14:58:06.3659020Z   Parameters: %{"user_id" => "59", "vsn" => "2.0.0"}
2020-06-27T14:58:06.3660420Z Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
2020-06-27T14:58:06.3661020Z Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' passed (0.006 seconds).
2020-06-27T14:58:06.3661490Z Test Case '-[PhoenixTests.SocketTests testRemoteClosePublishesClose]' started.
2020-06-27T14:58:06.3672900Z Exited with signal code 4
2020-06-27T14:58:08.5229670Z [info] JOINED room:timeout in 3023ms
2020-06-27T14:58:08.5229920Z   Parameters: %{"join" => true, "timeout" => 3000}
2020-06-27T14:58:08.7514760Z [info] JOINED room:timeout in 3014ms
2020-06-27T14:58:08.7514960Z   Parameters: %{"join" => true, "timeout" => 3000}
2020-06-27T14:58:08.9526880Z [info] JOINED room:timeout in 3022ms
2020-06-27T14:58:08.9527180Z   Parameters: %{"join" => true, "timeout" => 3000}
2020-06-27T14:58:11.3695530Z ##[error]Process completed with exit code 1.
2020-06-27T14:58:11.3717930Z Cleaning up orphan processes
2020-06-27T14:58:11.4041660Z Terminate orphan process: pid (1580) (erl_child_setup)
2020-06-27T14:58:11.4106930Z Terminate orphan process: pid (1562) (beam.smp)
atdrendel commented 4 years ago

The crash just happened again. It again came from our friend SocketTests.testRemoteClosePublishesClose(). Interesting.

Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' passed (0.017 seconds).
Test Case '-[PhoenixTests.SocketTests testRemoteClosePublishesClose]' started.
Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
Exited with signal code 4
atdrendel commented 4 years ago

Ok, so, maybe it's not as straightforward as we had hoped. Another crash while running the tests.

Test Case '-[PhoenixTests.ChannelTests testClosesChannelAfterReceivingOkResponseFromServer]' passed (0.015 seconds).
Test Case '-[PhoenixTests.ChannelTests testClosesChannelOnTimeoutOfLeavePush]' started.
[info] CONNECTED TO ServerWeb.Socket in 571µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"user_id" => "16", "vsn" => "2.0.0"}
[info] JOINED room:lobby in 29µs
  Parameters: %{}
Test Case '-[PhoenixTests.ChannelTests testClosesChannelOnTimeoutOfLeavePush]' passed (1.043 seconds).
Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
Test Case '-[PhoenixTests.ChannelTests testClosingChannelSetsStateToClosed]' started.
Exited with signal code 4
myobie commented 4 years ago

@atdrendel they are both related to closing...

atdrendel commented 4 years ago

I've got Charles working correctly with our Phoenix WebSockets. I'm going to run our tests in a loop until they fail in an attempt to catch the exception and the input causing it.

Running tests in a loop:

while swift test -v; do :; done

Update

I ran swift test -v in a loop a bunch of times. It crashed/failed a few times, but I couldn't see the reasons why the runs failed. So, I'm changing my approach to using xcodebuild to run the tests.

# generate the Xcode project file
swift package generate-xcodeproj

# run the tests in a loop
while xcodebuild test -project Phoenix.xcodeproj -scheme Phoenix-Package; do :; done
atdrendel commented 4 years ago

I managed to reproduce the crash while proxying the WebSocket traffic. Based on what I saw, the traffic was normal, as were the "requests" made before it.

Here's the information I captured during the crash.

Test log

Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' started.
2020-07-04 00:51:09.298627+0200 xctest[13201:8361315] Connection 60: received failure notification
Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' passed (0.031 seconds).
2020-07-04 00:51:09.327743+0200 xctest[13201:8361356] Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
2020-07-04 00:51:29.671 xcodebuild[13194:8362667]  IDETestOperationsObserverDebug: Writing diagnostic log for test session to:
Restarting after unexpected exit, crash, or test timeout in SocketTests/testPushOntoSocket(); summary will include totals from previous launches.

WebSocket traffic

C: [null,1,"phoenix","heartbeat",{}]
S: [null,1,"phoenix","phx_reply",{"response":{},"status":"ok"}]

Crash log

Thread 8 Crashed:: Dispatch queue: WebSocket.serialQueue
0   libswiftCore.dylib              0x00007fff7041230f _assertionFailure(_:_:file:line:flags:) + 863
1   libswiftFoundation.dylib        0x00007fff70982a4a partial apply for closure #1 in NSURLSessionWebSocketTask.receive(completionHandler:) + 458
2   libswiftFoundation.dylib        0x00007fff70930fc7 thunk for @escaping @callee_guaranteed (@guaranteed NSURLSessionWebSocketMessage?, @guaranteed Error?) -> () + 71
3   com.apple.Foundation            0x00007fff39433ac5 __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__ + 7
4   com.apple.Foundation            0x00007fff394339e6 -[NSBlockOperation main] + 80
5   com.apple.Foundation            0x00007fff39433981 __NSOPERATION_IS_INVOKING_MAIN__ + 17
6   com.apple.Foundation            0x00007fff39432bb3 -[NSOperation start] + 722
7   com.apple.Foundation            0x00007fff394328d9 __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 17
8   com.apple.Foundation            0x00007fff394327a9 __NSOQSchedule_f + 182
9   libdispatch.dylib               0x00007fff70d5d658 _dispatch_client_callout + 8
10  libdispatch.dylib               0x00007fff70d5fd79 _dispatch_block_invoke_direct + 245
11  libclang_rt.tsan_osx_dynamic.dylib  0x0000000102cbf30c __tsan::invoke_and_release_block(void*) + 12
12  libclang_rt.tsan_osx_dynamic.dylib  0x0000000102cbf062 __tsan::dispatch_callback_wrap(void*) + 306
13  libdispatch.dylib               0x00007fff70d5d658 _dispatch_client_callout + 8
14  libdispatch.dylib               0x00007fff70d62c44 _dispatch_lane_serial_drain + 597
15  libdispatch.dylib               0x00007fff70d635d6 _dispatch_lane_invoke + 363
16  libdispatch.dylib               0x00007fff70d6cc09 _dispatch_workloop_worker_thread + 596
17  libsystem_pthread.dylib         0x00007fff70fb7a3d _pthread_wqthread + 290
18  libsystem_pthread.dylib         0x00007fff70fb6b77 start_wqthread + 15

urlsession-crash.txt urlsession-crash-network-traffic.chlsj.zip

atdrendel commented 4 years ago

I ran the test suite against the in-progress NWConnection-backed websocket-apple branch. In order to make the tests pass consistently, I had to disable flaky tests, including testJoinRetriesWithBackoffIfTimeout(), testIsJoiningAfterJoin(), and one more I forgot (which I should have written down, of course). After doing so, though, the tests passed without an issue for over eight hours, at which point I just canceled the test run.

However, while running the tests, I saw "unexpected" messages being received when the WebSocket was not in the connecting or open states for the same tests that typically cause this crash when using URLSessionWebSocketTask.

Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' started.
onStateUpdate preparing
onStateUpdate ready
Fatal error: Unexpected message: [null,1,"phoenix","phx_reply",{"response":{},"status":"ok"}]: file /Users/atdrendel/shareup/websocket-apple/Sources/WebSocket/WebSocket.swift, line 178
2020-07-12 23:43:36.353530+0200 xctest[86982:469880] Fatal error: Unexpected message: [null,1,"phoenix","phx_reply",{"response":{},"status":"ok"}]: file /Users/atdrendel/shareup/websocket-apple/Sources/WebSocket/WebSocket.swift, line 178
Program ended with exit code: 9
myobie commented 4 years ago

@atdrendel that's great and it makes sense to me that we might receive a message during .closing. We should break out each possible case and output the variable it wraps to see if that's what is happening.