Closed RamblinWreck77 closed 2 years ago
hi @RamblinWreck77 , my bad for missing your changes and I will include it in my next PR. But I suspect the crash could be caused by something else. Do you have a full crashlog?
Here's a few of the the stack traces we have:
Seems like they're all happening when the next task in an async serial queue is being processed, maybe an incomplete capture/force unwrap on a weak object?
Might also be lock related, I know you guys use locks inside async queues and unfortunately I don't have a ton of experience with those in iOS (I try to avoid them entirely).
hi @RamblinWreck77 , looks like when the Track
is dequeued from the async queue for execution, some data(i.e. looks like an array in the last example above) inside the properties
dictionary is no longer accessible due to it possibly being released before that happens.
How frequent is the crash? When you track event with properties, for the properties dictionary, does it also contain references to some other mutable collections?
It's relatively rare, but it is also currently our # 1 and 2 most frequent crash so if we could knock it out it would help metrics. Currently impacts 0.16% of our DAU.
As for properties, no nothing is mutable. We have an event manager with a static dispatch function, so that a view controller can say something like "viewed page" with "page name" = "some page". That static func calls an async dispatch to a utility queue, with an immutable/captured copy of the event name/props. When that async queue is processed Mixpanel.mainInstance().track(event: ..., properties: ...) is called.
I wrapped our calls to mixpanel inside an async utility QoS queue so that there was no way main could be blocked by our event code.
That said, I feel like once the event name/props are "handed off" to mixpanel code everything should be "safe"... so that's weird. trackingQueue.async() inside track() captures the event
and properties
inputs so again we should be safe here... Very odd.
Also noticed this inside MixpanelInstance.swift / flush()
trackingQueue.async { [weak self, completion] in
guard let self = self else {
return
}
if let shouldFlush = self.delegate?.mixpanelWillFlush(self), !shouldFlush {
return
}
let eventQueue = self.mixpanelPersistence.loadEntitiesInBatch(type: self.persistenceTypeFromFlushType(.events))
let peopleQueue = self.mixpanelPersistence.loadEntitiesInBatch(type: self.persistenceTypeFromFlushType(.people))
let groupsQueue = self.mixpanelPersistence.loadEntitiesInBatch(type: self.persistenceTypeFromFlushType(.groups))
self.networkQueue.async { [weak self, completion] in
guard let self = self else {
return
}
self.flushQueue(eventQueue, type: .events)
self.flushQueue(peopleQueue, type: .people)
self.flushQueue(groupsQueue, type: .groups)
if let completion = completion {
DispatchQueue.main.async(execute: completion)
}
}
}
Shouldn't networkQueue capture eventQueue/peopleQueue/groupsQueue explicitly? Those appear to be intended as snapshots of the pending data to flush so I think capturing them explicitly makes sense. Probably not a huge deal, just a best practice.
I saw the force unwrap changes went in with 3.1.0. I'll try and get a prod release out soon to see if that resolves the issue!
thanks @RamblinWreck77 !
Same issue after upgrade from 2.x to 3.1.0:
Crashed: com.mixpanel.f15cc2d359522a6b1415fe97b156443c.tracking)
0 libswiftCore.dylib 0x383c88 _swift_release_dealloc + 24
1 MyApp 0x5aef40 closure #1 in MixpanelInstance.track(event:properties:) + 948 (MixpanelInstance.swift:948)
2 MyAppp 0x5a7b50 thunk for @escaping @callee_guaranteed () -> () + 4369431376 (<compiler-generated>:4369431376)
3 libdispatch.dylib 0x2a84 _dispatch_call_block_and_release + 32
4 libdispatch.dylib 0x481c _dispatch_client_callout + 20
5 libdispatch.dylib 0xc004 _dispatch_lane_serial_drain + 620
6 libdispatch.dylib 0xcc00 _dispatch_lane_invoke + 404
7 libdispatch.dylib 0x174bc _dispatch_workloop_worker_thread + 764
8 libsystem_pthread.dylib 0x37a4 _pthread_wqthread + 276
9 libsystem_pthread.dylib 0xa74c start_wqthread + 8
@zihejia No joy, I can confirm this issue remains in 3.1.2
@zihejia Sorry for the double-ping, but I have a few questions about this function starting at line 934 in MixpanelInstance.swift:
open func track(event: String?, properties: Properties? = nil) {
if hasOptedOutTracking() {
return
}
let epochInterval = Date().timeIntervalSince1970
trackingQueue.async { [weak self, event, properties, epochInterval] in
guard let self = self else {
return
}
let mixpanelIdentity = MixpanelIdentity.init(distinctID: self.distinctId,
peopleDistinctID: nil,
anonymousId: self.anonymousId,
userId: self.userId,
alias: nil,
hadPersistedDistinctId: self.hadPersistedDistinctId)
self.timedEvents = self.trackInstance.track(event: event,
properties: properties,
timedEvents: self.timedEvents,
superProperties: self.superProperties,
mixpanelIdentity: mixpanelIdentity,
epochInterval: epochInterval)
}
if MixpanelInstance.isiOSAppExtension() {
flush()
}
}
So when the customer's app calls track(), it captures the time (epochInterval) at track time
, and it goes into an async queue. Identity (mixpanelIdentity) is then resolved at dispatch time
, which of course is whenever trackingQueue.async() gets around to executing.
My questions are:
1) What happens if the users identity, such as userId, is null at track time
but defined at dispatch time
? Wouldn't that cause the customers data to incorrectly have a userId when it was really null at track time
?
2) Is the behavior in 1) intentional? It would seem to me that the user's identity at track time
is the customer's intent when dispatching an event to mixpanel, not whatever their identity is when mixpanel's .utility QoS trackingQueue gets around to broadcasting the data to mixpanel servers
3) If the above behavior is unintentional, what do you think about the following change:
open func track(event: String?, properties: Properties? = nil) {
if hasOptedOutTracking() {
return
}
// Capture these values at track-time, async dispatch
let epochInterval = Date().timeIntervalSince1970
let mixpanelIdentity = MixpanelIdentity.init(distinctID: distinctId,
peopleDistinctID: nil,
anonymousId: anonymousId,
userId: userId,
alias: nil,
hadPersistedDistinctId: hadPersistedDistinctId)
let timedEventsSnapshot = timedEvents
let superPropertiesSnapshot = superProperties
trackingQueue.async { [weak self, event, properties, epochInterval, mixpanelIdentity, timedEventsSnapshot, superPropertiesSnapshot] in
guard let self = self else {
return
}
self.timedEvents = self.trackInstance.track(event: event,
properties: properties,
timedEvents: timedEventsSnapshot,
superProperties: superPropertiesSnapshot,
mixpanelIdentity: mixpanelIdentity,
epochInterval: epochInterval)
}
if MixpanelInstance.isiOSAppExtension() {
flush()
}
}
I think this would guarantee that things like identity/super props at track time
are correctly sent to mixpanel at dispatch time
hi @RamblinWreck77 , 1) is intentional and you are right, the goal is to make things like identity/super props at track time are correctly sent to Mixpanel at dispatch time. One way to guarantee that is to make sure all changes related to mixpanelIdentity
, timedEvents
, and super properties
are protected by the same trackingQueue
. If you move them out of the trackingQueue
, it will not be the case anymore.
Hi, we're also experiencing crashes in production with a similar stack trace, see screenshots for reference. It's dropping our crash free rate ~2% per day, so it's happening quite often for us.
Is there a workaround available / in the works? Can we downgrade to a stable version that doesn't have this issue? We've removed Mixpanel from our app for now.
Thanks in advance
hi @rickvdl , thanks for bringing it up. Which SDK version are you using? and have you started to experience this crash from a particular version?
Hi @zihejia , we were using 3.1.0
I just noticed the 3.1.3
release, but couldn't find anything in the change log referring to a possible fix. Is it possible that this issue was resolved?
hi @rickvdl , we are still looking into it and the issue has not been resolved yet. But we highly recommend you upgrade to 3.1.3
because 3.1.0
has a known issue for ingesting any event or property containing %
, &
.
Thanks @zihejia as far as I can see we're not sending these in any values nor sending user input that could contain this, so I'll hold off putting the SDK back in for now. Thanks for the update, hope it can be resolved soon
hi @rickvdl , it seems the crash happens when trying to access the properties
in Track
, did you happen to have any mutable data inside the properties
? Also, does this crash only occur on certain iOS versions?
@zihejia As far as I can see we're only sending value types. Does the logic support sending nil
values? It looks like the crash is happening on both iOS 15 as well as 14.
hi @RamblinWreck77 @rickvdl . We have released v3.1.4. It has several critical fixes to prevent race conditions. Would you like to try it and see how it goes?
Thanks for the quick release @zihejia ! We'll give it a try and let you know as soon as we have some data :)
Hello, reporting back from #515 - after reverting back to 2.8.2
we no longer see those crashes. How would we go about verifying that 3.1.4
resolves those issues, as we were unable to reproduce the original crashes? I cannot test this in production 😃
hi @ifabijanovic , that's great news. 3.1.4 has a similar approach to fix race conditions as 2.8.2 does, so 3.1.4 should be good but no rush if you just reverted.
@zihejia 3.1.4
is in use by ~60% of our daily active users and so far we haven't seen any crashes, will report back after full release, but so far it appears to be fixed for us.
@zihejia After fully releasing an app version with SDK version 3.1.4
we no longer see the crash. From my side this issue can be closed. Thanks!
hi @RamblinWreck77 , do you still have crashes? Can this issue be closed? Thanks!
Same issue after upgrade from 2.x to 3.1.0:
Crashed: com.mixpanel.f15cc2d359522a6b1415fe97b156443c.tracking) 0 libswiftCore.dylib 0x383c88 _swift_release_dealloc + 24 1 MyApp 0x5aef40 closure #1 in MixpanelInstance.track(event:properties:) + 948 (MixpanelInstance.swift:948) 2 MyAppp 0x5a7b50 thunk for @escaping @callee_guaranteed () -> () + 4369431376 (<compiler-generated>:4369431376) 3 libdispatch.dylib 0x2a84 _dispatch_call_block_and_release + 32 4 libdispatch.dylib 0x481c _dispatch_client_callout + 20 5 libdispatch.dylib 0xc004 _dispatch_lane_serial_drain + 620 6 libdispatch.dylib 0xcc00 _dispatch_lane_invoke + 404 7 libdispatch.dylib 0x174bc _dispatch_workloop_worker_thread + 764 8 libsystem_pthread.dylib 0x37a4 _pthread_wqthread + 276 9 libsystem_pthread.dylib 0xa74c start_wqthread + 8
Update from me: no longer crashes.
We haven't shipped to prod in a while/since this fix went live in the mixpanel lib. We should be shipping a release within a week and I can report back then!
@RamblinWreck77, no worries, looking forward to it!
Looks fixed to me! Closing this :)
Hey team, congrats on 3.0.0! I am however still seeing an (admittedly rare) crash in prod. My automated crash reporting seems to think it's in line 34 of Track.swift. It also seems to happen extremely early in the app boot cycle, always within the first 500ms or so.
Looking at the code, I was surprised to see that
ev
is still force-unwraped. IIRC I had a PR where that was changed to a crash-safe if-let implementation. @zihejia Would you mind explaining the rationale behind keepingvar ev
force unwrapped?Current Code:
My Suggestion:
This removes any need for force unwrapping later. Also, while I can't prove this, I don't trust the force unwrap in the current code's if statement to be safe. I know that for
if (A || B)
, A should always be checked first, but I wouldn't put it past Apple/the swift compiler optimizations in release mode to occasionally check B before A. When B is a run-time crash if A wasn't checked first I feel that's just asking for trouble.This may or may not be the cause of the rare crash, but IMO it's worth changing either way. Thanks and have a good one!