Open atdrendel opened 4 years ago
This has happened locally on my MacBook Pro and when using GitHub Actions.
This is the call-site where the error output happens: https://github.com/apple/swift/blob/164ebbf0e43916285a9cb19a7d68253fb7ba57b6/stdlib/public/Darwin/Foundation/URLSession.swift#L46
This reproduced on my MacBook Pro just now. I gathered all system state and will try to catch this tomorrow.
I haven't seen this one in a while. I'll keep it open, however, because it's always been rare.
This happened a few times yesterday and today. Most recently, it happened with SocketTests.testPushOntoSocket
It happened today. I wonder if I misread the previous crash log. Maybe it happens in SocketTests.testRemoteClosePublishesClose()
, instead. At least, that's what it looks like from today's test log:
2020-06-27T14:58:06.3595270Z Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' started.
2020-06-27T14:58:06.3649020Z [info] CONNECTED TO ServerWeb.Socket in 172µs
2020-06-27T14:58:06.3658760Z Transport: :websocket
2020-06-27T14:58:06.3658920Z Serializer: Phoenix.Socket.V2.JSONSerializer
2020-06-27T14:58:06.3659020Z Parameters: %{"user_id" => "59", "vsn" => "2.0.0"}
2020-06-27T14:58:06.3660420Z Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
2020-06-27T14:58:06.3661020Z Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' passed (0.006 seconds).
2020-06-27T14:58:06.3661490Z Test Case '-[PhoenixTests.SocketTests testRemoteClosePublishesClose]' started.
2020-06-27T14:58:06.3672900Z Exited with signal code 4
2020-06-27T14:58:08.5229670Z [info] JOINED room:timeout in 3023ms
2020-06-27T14:58:08.5229920Z Parameters: %{"join" => true, "timeout" => 3000}
2020-06-27T14:58:08.7514760Z [info] JOINED room:timeout in 3014ms
2020-06-27T14:58:08.7514960Z Parameters: %{"join" => true, "timeout" => 3000}
2020-06-27T14:58:08.9526880Z [info] JOINED room:timeout in 3022ms
2020-06-27T14:58:08.9527180Z Parameters: %{"join" => true, "timeout" => 3000}
2020-06-27T14:58:11.3695530Z ##[error]Process completed with exit code 1.
2020-06-27T14:58:11.3717930Z Cleaning up orphan processes
2020-06-27T14:58:11.4041660Z Terminate orphan process: pid (1580) (erl_child_setup)
2020-06-27T14:58:11.4106930Z Terminate orphan process: pid (1562) (beam.smp)
The crash just happened again. It again came from our friend SocketTests.testRemoteClosePublishesClose()
. Interesting.
Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' passed (0.017 seconds).
Test Case '-[PhoenixTests.SocketTests testRemoteClosePublishesClose]' started.
Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
Exited with signal code 4
Ok, so, maybe it's not as straightforward as we had hoped. Another crash while running the tests.
Test Case '-[PhoenixTests.ChannelTests testClosesChannelAfterReceivingOkResponseFromServer]' passed (0.015 seconds).
Test Case '-[PhoenixTests.ChannelTests testClosesChannelOnTimeoutOfLeavePush]' started.
[info] CONNECTED TO ServerWeb.Socket in 571µs
Transport: :websocket
Serializer: Phoenix.Socket.V2.JSONSerializer
Parameters: %{"user_id" => "16", "vsn" => "2.0.0"}
[info] JOINED room:lobby in 29µs
Parameters: %{}
Test Case '-[PhoenixTests.ChannelTests testClosesChannelOnTimeoutOfLeavePush]' passed (1.043 seconds).
Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
Test Case '-[PhoenixTests.ChannelTests testClosingChannelSetsStateToClosed]' started.
Exited with signal code 4
@atdrendel they are both related to closing...
I've got Charles working correctly with our Phoenix WebSockets. I'm going to run our tests in a loop until they fail in an attempt to catch the exception and the input causing it.
Running tests in a loop:
while swift test -v; do :; done
Update
I ran swift test -v
in a loop a bunch of times. It crashed/failed a few times, but I couldn't see the reasons why the runs failed. So, I'm changing my approach to using xcodebuild
to run the tests.
# generate the Xcode project file
swift package generate-xcodeproj
# run the tests in a loop
while xcodebuild test -project Phoenix.xcodeproj -scheme Phoenix-Package; do :; done
I managed to reproduce the crash while proxying the WebSocket traffic. Based on what I saw, the traffic was normal, as were the "requests" made before it.
Here's the information I captured during the crash.
Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' started.
2020-07-04 00:51:09.298627+0200 xctest[13201:8361315] Connection 60: received failure notification
Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' passed (0.031 seconds).
2020-07-04 00:51:09.327743+0200 xctest[13201:8361356] Fatal error: Only one of message or error should be nil: file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/swiftlang_overlay_Foundation/swiftlang-1103.8.25.8/swift/stdlib/public/Darwin/Foundation/URLSession.swift, line 46
2020-07-04 00:51:29.671 xcodebuild[13194:8362667] IDETestOperationsObserverDebug: Writing diagnostic log for test session to:
Restarting after unexpected exit, crash, or test timeout in SocketTests/testPushOntoSocket(); summary will include totals from previous launches.
C: [null,1,"phoenix","heartbeat",{}]
S: [null,1,"phoenix","phx_reply",{"response":{},"status":"ok"}]
Thread 8 Crashed:: Dispatch queue: WebSocket.serialQueue
0 libswiftCore.dylib 0x00007fff7041230f _assertionFailure(_:_:file:line:flags:) + 863
1 libswiftFoundation.dylib 0x00007fff70982a4a partial apply for closure #1 in NSURLSessionWebSocketTask.receive(completionHandler:) + 458
2 libswiftFoundation.dylib 0x00007fff70930fc7 thunk for @escaping @callee_guaranteed (@guaranteed NSURLSessionWebSocketMessage?, @guaranteed Error?) -> () + 71
3 com.apple.Foundation 0x00007fff39433ac5 __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__ + 7
4 com.apple.Foundation 0x00007fff394339e6 -[NSBlockOperation main] + 80
5 com.apple.Foundation 0x00007fff39433981 __NSOPERATION_IS_INVOKING_MAIN__ + 17
6 com.apple.Foundation 0x00007fff39432bb3 -[NSOperation start] + 722
7 com.apple.Foundation 0x00007fff394328d9 __NSOPERATIONQUEUE_IS_STARTING_AN_OPERATION__ + 17
8 com.apple.Foundation 0x00007fff394327a9 __NSOQSchedule_f + 182
9 libdispatch.dylib 0x00007fff70d5d658 _dispatch_client_callout + 8
10 libdispatch.dylib 0x00007fff70d5fd79 _dispatch_block_invoke_direct + 245
11 libclang_rt.tsan_osx_dynamic.dylib 0x0000000102cbf30c __tsan::invoke_and_release_block(void*) + 12
12 libclang_rt.tsan_osx_dynamic.dylib 0x0000000102cbf062 __tsan::dispatch_callback_wrap(void*) + 306
13 libdispatch.dylib 0x00007fff70d5d658 _dispatch_client_callout + 8
14 libdispatch.dylib 0x00007fff70d62c44 _dispatch_lane_serial_drain + 597
15 libdispatch.dylib 0x00007fff70d635d6 _dispatch_lane_invoke + 363
16 libdispatch.dylib 0x00007fff70d6cc09 _dispatch_workloop_worker_thread + 596
17 libsystem_pthread.dylib 0x00007fff70fb7a3d _pthread_wqthread + 290
18 libsystem_pthread.dylib 0x00007fff70fb6b77 start_wqthread + 15
urlsession-crash.txt urlsession-crash-network-traffic.chlsj.zip
I ran the test suite against the in-progress NWConnection
-backed websocket-apple
branch. In order to make the tests pass consistently, I had to disable flaky tests, including testJoinRetriesWithBackoffIfTimeout()
, testIsJoiningAfterJoin()
, and one more I forgot (which I should have written down, of course). After doing so, though, the tests passed without an issue for over eight hours, at which point I just canceled the test run.
However, while running the tests, I saw "unexpected" messages being received when the WebSocket was not in the connecting or open states for the same tests that typically cause this crash when using URLSessionWebSocketTask
.
Test Case '-[PhoenixTests.SocketTests testPushOntoSocket]' started.
onStateUpdate preparing
onStateUpdate ready
Fatal error: Unexpected message: [null,1,"phoenix","phx_reply",{"response":{},"status":"ok"}]: file /Users/atdrendel/shareup/websocket-apple/Sources/WebSocket/WebSocket.swift, line 178
2020-07-12 23:43:36.353530+0200 xctest[86982:469880] Fatal error: Unexpected message: [null,1,"phoenix","phx_reply",{"response":{},"status":"ok"}]: file /Users/atdrendel/shareup/websocket-apple/Sources/WebSocket/WebSocket.swift, line 178
Program ended with exit code: 9
@atdrendel that's great and it makes sense to me that we might receive a message during .closing
. We should break out each possible case and output the variable it wraps to see if that's what is happening.
Unsolved thread on Developer Forums