segmentio / analytics-ios

The hassle-free way to integrate analytics into any iOS application.
https://segment.com/libraries/ios
MIT License
399 stars 334 forks source link

Deadlock on app launch #1008

Closed piotrtobolski closed 3 years ago

piotrtobolski commented 3 years ago

Description

Segment SDK can deadlock on app launch when using some of the integrations. Amplitude integration is one of those and I think it is one of the more popular ones.

The flaw in the integration manager module. It happens when integration initializer calls dispatch_sync with main queue. This is the case inside -[Amplitude runSynchronouslyOnMainQueue:] which is called from -[Amplitude initializeApiKey:userId:setUserId:]. This situation doesn't happen always because it is related to disk writing latency but in my app on some devices it was happening very often right after installation of the app from TestFlight. Detailed steps below.

I will also provide PR fixing this

Prerequisities

Steps

-> main queue

  1. -[SEGIntegrationsManager initWithAnalytics:]
  2. Configuration was downloaded very quickly and completion is handled on io.segment.analytics queue

-> io.segment.analytics queue

  1. configuration written to file in -[SEGIntegrationsManager setCachedSettings:]
  2. writing to file is finished but sometimes the code waits there and context gets switched
    • simulate it by putting [NSThread sleepForTimeInterval:1] after writing to file

-> main queue continues

  1. -[SEGIntegrationsManager onAppForeground:]
  2. -[SEGIntegrationsManager refreshSettings]
  3. file exists on disk because already written in 3.
  4. -[SEGIntegrationsManager setCachedSettings:]
  5. Writing to file again takes time and context gets switched

-> io.segment.analytics queue continues

  1. -[SEGIntegrationsManager updateIntegrationsWithSettings:]
  2. Amplitude integration initializer calls dispatch_sync with main queue and waits

-> main queue

  1. -[SEGIntegrationsManager updateIntegrationsWithSettings:]
  2. calls dispatch_sync with io.segment.analytics queue and waits

FIN: deadlock because both io.segment.analytics and main queue are waiting for each other

Example log

[NSThread sleepForTimeInterval:2] added after writing settings to file

00.522774 initWithAnalytics isMainThread 1
00.576702 refreshSettings isMainThread 1
00.929121 settingsRequestCompletionHandler isMainThread 0
00.929278 setCachedSettings isMainThread 0
01.080838 onAppForeground isMainThread 1
01.080872 refreshSettings isMainThread 1
01.080902 setCachedSettings isMainThread 1
02.941512 updateIntegrationsWithSettings isMainThread 0
03.007089 (Amplitude initializer) runSynchronouslyOnMainQueue isMainThread 0
03.085322 updateIntegrationsWithSettings isMainThread 1
(no further messages)
bsneed commented 3 years ago

Hi @piotrtobolski!

Thanks for reporting this; however I don't think a sleep is the proper fix here. I'm unable to reproduce locally, however from your triage of the issue, the fix should be that amplitude SDK should stop doing dispatch_sync on the main queue when it is called from the main queue. Or optionally, the amplitude segment integration should call Amplitude.init from a background thread if that is required.

Typically, using a sleep to fix a race condition just pushes said race condition elsewhere.

piotrtobolski commented 3 years ago

@bsneed please revise and reopen the issue. I think there is some misunderstanding. This is just issue report, not a solution. Sleep is used to simulate writing to disk taking long time. It isn't used for the fix but to ensure whoever is debugging it has 100% reproduction.

If you check the source changes in attached PR #1009 (It's just 3 lines) you will see that the fix is to ensure that -[SEGIntegrationsManager refreshSettings] is always called on the same (segment private) queue ensuring that shared resource access is synchronized. This solution isn't moving the race condition elsewhere but removes it completely.

I considered making a fix in Amplitude but I decided it isn't the culprit here. It is calling dispatch_sync with main only when code is executed from non-main queue and it is required to call UIKit methods. Segment SDK is the one that is calling dispatch_sync with private queue from main and blocking it. This PR fixes this. This is result of my investigation. Of course, there may be other solutions (on integrated SDKs side as well) but I think it is still worth implementing this fix because integrating with other SDKs is Segment core feature so it is important to be a good citizen and avoid blocking the main queue.

philipengberg commented 3 years ago

We still consistently see crashes on first launch of our app 💥 This does not seem to have been resolved

piotrtobolski commented 3 years ago

I am using my fork which includes fix for this issue: https://github.com/piotrtobolski/analytics-ios/tree/deadlock-fix It is running in production and I see no crashes about this anymore.

@philipengberg You can add it to cocoapods like this: pod 'Analytics', :git => 'https://github.com/piotrtobolski/analytics-ios.git', :branch => 'deadlock-fix'

I plan to update this branch every few weeks when origin has updates. Point to specific commit hash in podfile if you don't want updates.

bsneed commented 3 years ago

Hi @piotrtobolski you're right, I was misunderstanding what you were saying. Really sorry about that.

ankittlp commented 2 years ago

@bsneed https://github.com/segmentio/analytics-ios/issues/1043 Is this related happening iOS 16? Seeing for few users.