Closed igled7 closed 2 years ago
@igled7 v3.1.0 includes a lot of improvements, including a complete overhaul of the flushing mechanisms. Can you try upgrading to the latest version to see if the issue persists? If it does, please add that stack trace and we'll investigate.
@jaredmixpanel we are planning to upgrade to the latest version of Mixpanel, but I don't think it will help. The crashes seem to be related to a known bug in Apple's implementation of QUIC introduced in iOS 15. They said that it was fixed in iOS 15.2, and this seems to be the case as we don't see the crash happening in that version.
We are trying to understand why this crash started to happen. Have you upgraded your servers to support HTTP/3 this week? I can see that Mixpanel supports HTTP/3.
@jaredmixpanel We are seeing same crashes reported by our app as well. Stack trace for the crash is pointed to networking:
Crashed: com.apple.network.connections
0 libquic.dylib 0x69938 qlog_abort_internal + 272
1 libquic.dylib 0x69924 qlog_abort_internal + 252
2 libquic.dylib 0x639e4 quic_frame_write_PADDING + 640
3 libquic.dylib 0x9c6dc _quic_packet_builder_assemble + 2048
4 libquic.dylib 0x2490 quic_packet_builder_assemble + 124
5 libquic.dylib 0x366bc quic_assemble_and_encrypt + 260
6 libquic.dylib 0x37a04 __quic_send_frames_for_key_state_block_invoke.106 + 1016
7 libnetwork.dylib 0x603310 nw_protocol_data_access_buffer + 1160
8 libquic.dylib 0x1c8c8 __quic_send_frames_for_key_state_block_invoke + 200
9 libnetwork.dylib 0xb9d4 nw_protocol_service_requested_outbound_data + 360
10 libnetwork.dylib 0x5e99fc nw_protocol_request_outbound_data + 128
11 libquic.dylib 0x22cb8 quic_send_frames_for_key_state + 1376
...
But interestingly enough it's only Mixpanel who is making networking call at the same time:
com.mixpanel.3bfa18ec20196c56b5726c1d0af33dfa.network)
0 libsystem_kernel.dylib 0x1540 semaphore_wait_trap + 8
1 libdispatch.dylib 0x4bf0 _dispatch_sema4_wait + 28
2 libdispatch.dylib 0x52a8 _dispatch_semaphore_wait_slow + 132
3 libswiftDispatch.dylib 0x1994 OS_dispatch_semaphore.wait(wallTimeout:) + 24
4 App 0xaed790 Flush.flushQueueInBatches(_:type:) + 156 (Flush.swift:156)
5 App 0xaec888 Flush.flushEventsQueue(_:automaticEventsEnabled:) + 89 (Flush.swift:89)
6 App 0xb1d344 closure #1 in closure #1 in MixpanelInstance.flush(completion:) + 1188 (MixpanelInstance.swift:1188)
7 App 0xb0e5f8 thunk for @escaping @callee_guaranteed () -> () + 3156540 (<compiler-generated>:3156540)
8 libdispatch.dylib 0x2914 _dispatch_call_block_and_release + 32
9 libdispatch.dylib 0x4660 _dispatch_client_callout + 20
10 libdispatch.dylib 0xbde4 _dispatch_lane_serial_drain + 672
11 libdispatch.dylib 0xc958 _dispatch_lane_invoke + 392
12 libdispatch.dylib 0x171a8 _dispatch_workloop_worker_thread + 656
13 libsystem_pthread.dylib 0x10f4 _pthread_wqthread + 288
14 libsystem_pthread.dylib 0xe94 start_wqthread + 8
@bolshedvorsky what version of our SDK are you using?
@jaredmixpanel you can find more info here. Apparently, some people started to see a similar pattern recently.
As I mentioned before, this seems to be related to Apple's HTTP/3 buggy implementation. I think that you could stop this crash from happening if you disable HTTP/3 on api-eu.mixpanel.com (at least for iOS clients).
Edit: I managed to contact one of the people that reported a lot of crashes in the official Apple forum and they are using Mixpanel as well...
@jaredmixpanel Our production builds are using 2.x.x versions, we have plans to move to 3.x.x versions but it needs to be an update and a rollout to our entire user base
I'm having this exact issue as well.
hi @igled7 @bolshedvorsky @quintonpryce, are you able to reproduce it locally? What is the crash rate? We have a hard time reproducing it using 3.1.0. We are more comfortable disabling QUIC if we have a deterministic way to reproduce this problem and we could test the before/after behaviors after making the change.
Hi @zihejia,
This crash started to happen on the 12th of Jan at 8 PM GMT. Did your infrastructure team make any changes around that period?
If you disable HTTP/3 today, I will be able to report tomorrow if it fixes the current situation. Given the high number of crashes that many people are having (for @quintonpryce it's the number 1 crash as well), I think is worth the try.
hi @igled7 , we are using GCP and it did seem that Google silently made QUIC traffic changes for our GLB around that time you mentioned. We will disable it and let you know.
hi @igled7 , we have disabled it but it usually takes a little while to be fully disabled. But you can keep an eye on the crash report from now.
Thanks @zihejia. I'll report back tomorrow.
Thanks @zihejia I checked our logs and it looks like your recent change made this crash to go away. We had the same issue when the app started to crash suddenly starting from Jan 12th. We had few crashes on 19th and no crashes on 20th.
Hi @zihejia, it seems that the crashes have stopped for us as well.
I'm closing this issue for now. Sorry for the inconvenience.
Hi,
We started to see a high number of crashes yesterday (13 Jan). They come from different app versions, and we never had that crash before.
This is a fragment of the stack trace of the thread were the app is crashing:
It seems that when this crash happens, Mixpanel SDK (v2.8.3) is performing some work on its network thread. We noticed that all the crash stack traces follow the same pattern.
Have you made any BE changes recently that could have caused this?