segmentio / analytics-swift

The hassle-free way to add Segment analytics to your Swift app (iOS/tvOS/watchOS/macOS/Linux).
MIT License
92 stars 81 forks source link

Loosing events after upgrading to 1.5.7 #324

Closed xmollv closed 3 months ago

xmollv commented 3 months ago

Describe the bug

A few days ago our team realized that some analytics were wrong, and after some investigation it seems that something happened after 1.5.5 where Segment is loosing tracked events. Let me explain:

We had build 1.2 on the App Store using Segment 1.5.5. When we released 1.3 (and 1.4), the only change about analytics was updating Segment to 1.5.7. After doing that, we realized on Mixpanel (what we use to read Segment's data) that some events were totally wrong. We have an onboarding flow where you do steps A → B → C. To reach C, you must have gotten through A & B. Since 1.3, we were seeing that some users reached C without going through A & B, which is impossible.

We then released 1.4.1 where the only change was downgrading Segment to 1.5.5. Since then, we see all events back to normal. Not a single user has managed to skip A and/or B, which leads me to believe that in some of the latest versions of Segment there's a bug somewhere that's leading to data loss.

Here's a Mixpanel graph of the discrepancy in the data. Observe how March 23rd the lines start diverging (release of our 1.3 app version which included Segment's 1.5.7 SDK) until April 3rd, where almost all of our users are on 1.4.1 (which means they're using the now downgraded version of Segment, that is 1.5.5).

Screenshot 2024-04-04 at 10 45 50

To Reproduce

Unclear. Seems to start happening on versions > 1.5.5 of the SDK.

Expected behavior

No data loss of tracked events.

Screenshots

See above.

Platform (please complete the following information):

Additional context

N/A

alanjcharles commented 3 months ago

Hi @xmollv would you mind reaching out to friends@segment.com with this report? They have more access to your segment event logs than we do on our side. They will be able to help with a more detailed investigation to get to the bottom of this and will escalate any outstanding issues from that investigation to us. Thanks!

bsneed commented 3 months ago

In addition to what @alanjcharles stated, I'd first try the latest. There were some previously resolved issues that could result in a situation that you're describing. Feel free to reopen and/or reply to this ticket if you have any more questions for us.

xmollv commented 3 months ago

Hi @xmollv would you mind reaching out to friends@segment.com with this report? They have more access to your segment event logs than we do on our side. They will be able to help with a more detailed investigation to get to the bottom of this and will escalate any outstanding issues from that investigation to us. Thanks!

Cool! I've just got in touch with that email, if and when there's a 'solution' found I'll update this ticket for anyone that might have experienced the same issue.

In addition to what @alanjcharles stated, I'd first try the latest. There were some previously resolved issues that could result in a situation that you're describing. Feel free to reopen and/or reply to this ticket if you have any more questions for us.

The thing is, we already lost 2 weeks of 'good' data on production. We can't afford to push a build with a broken analytics SDK. How confident are you that what could have been broken on > 1.5.5 is fixed on 1.5.9? If I had a way to reproduce the issue reliably I wouldn't mind testing it myself, but I don't know exactly how to reproduce the events being lost.

xmollv commented 3 months ago

@bsneed @alanjcharles I've been poking at the diffs from https://github.com/segmentio/analytics-swift/compare/1.5.5...1.5.9. I'm pretty sure the issue we are facing was introduced in this PR https://github.com/segmentio/analytics-swift/pull/304, which seemed to ship with 1.5.6. After that, I don't see any fixes related that refactor (besides access levels), which leads me to believe that the issue is not fixed on 1.5.9.

Of course, I might be totally wrong here since I don't know this codebase at all. Just trying to help figure out what's wrong because I really don't like not being able to be on the latest version of an SDK we use! 🙏🏼

tristan-warner-smith commented 3 months ago

Hi @bsneed @alanjcharles, we're starting to see this hit us significantly across all tracked events after upgrading from 1.5.5 to 1.5.8. I can see there were no significant changes between 1.5.8 and 1.5.9 unless the privacy policy changes were blocking network traffic entirely.

This is the only relevant change over the time period shown below. A drop of nearly 75% in this particular event case.

Can you re-open this ticket / otherwise guide us on getting this looked at as a priority?

drop-in-events
alanjcharles commented 3 months ago

@tristan-warner-smith sorry to hear you're running into issues. We pushed 1.5.9 after reports that Apple was blocking all traffic to Segment because we had included privacy domains in the Privacy Manifest which was resulting in lost events.

I am unable to replicate losing events in my local environment on 1.5.9 at the moment. Are you noticing a particular event type is missing? Are you able to successfully send events in a dev environment? Any additional info you can provide would be greatly appreciated. Looking forward to helping you get to the bottom of this.

FYI: you can add a breakpoint here to see your events being batched/sent.

tristan-warner-smith commented 3 months ago

@tristan-warner-smith sorry to hear you're running into issues. We pushed 1.5.9 after reports that Apple was blocking all traffic to Segment because we had included privacy domains in the Privacy Manifest which was resulting in lost events.

I am unable to replicate losing events in my local environment on 1.5.9 at the moment. Are you noticing a particular event type is missing? Are you able to successfully send events in a dev environment? Any additional info you can provide would be greatly appreciated. Looking forward to helping you get to the bottom of this.

FYI: you can add a breakpoint here to see your events being batched/sent.

With this in mind we're putting out a hotfix bumping from 1.5.8 to 1.5.9 to see if it resolves what we're seeing. I'll let you know if we spot any change.

xmollv commented 3 months ago

@tristan-warner-smith sorry to hear you're running into issues. We pushed 1.5.9 after reports that Apple was blocking all traffic to Segment because we had included privacy domains in the Privacy Manifest which was resulting in lost events. I am unable to replicate losing events in my local environment on 1.5.9 at the moment. Are you noticing a particular event type is missing? Are you able to successfully send events in a dev environment? Any additional info you can provide would be greatly appreciated. Looking forward to helping you get to the bottom of this. FYI: you can add a breakpoint here to see your events being batched/sent.

With this in mind we're putting out a hotfix bumping from 1.5.8 to 1.5.9 to see if it resolves what we're seeing. I'll let you know if we spot any change.

Eager to see what comes out of this! We have started tested internally 1.5.9 but we don't have enough data to be able to tell if it's fixed or not. On Production we rolled back to 1.5.5 on the last release and the data seems to be back to normal. This is the graph that I posted on the original report right now, where the only change is rolling back Segment from 1.5.7 to 1.5.5.

image
xmollv commented 1 month ago

I know y'all didn't make a big fuss of this, but things are still pretty broken for us. We were in 1.5.11 and I just had to roll back to 1.5.5 (again) because users were not showing up on Mixpanel (the identify calls had no properties). Running the same flows on 1.5.5 does indeed work (we do see the profiles created as expected on Mixpanel), which means after 1.5.5 something really bad is going on internally.

Leaving it here for anyone searching online that Segment is missing events.

alanjcharles commented 1 month ago

Hi @xmollv would you mind reaching out to friends@segment.com with the details you've shared here? They will be better able to assist in getting to the bottom of the issue you're experiencing and will be able to escalate any bugs resulting from the investigation to us so we can prioritize them accordingly. Thanks so much!

xmollv commented 1 month ago

Hi @xmollv would you mind reaching out to friends@segment.com with the details you've shared here? They will be better able to assist in getting to the bottom of the issue you're experiencing and will be able to escalate any bugs resulting from the investigation to us so we can prioritize them accordingly. Thanks so much!

I already did it last time, and after two weeks of back and forth emails it ended up as 'we don't know what's wrong, so can't reproduce it, therefore we can't fix it'. I'm not going to do it again to get the same result. I think I've already done enough, if y'all don't want to prioritize this that's fine.

I told my team that I've pinned the dependency to 1.5.5 and it'll never be updated again until one of this two things happens: a) There's a new feature that we want to use and it's available only on a newer SDK version. b) The app stops compiling due to the SDK being so old that something broke along the way.

PS: This is the message I saw this morning when I logged into work from the person that's in charge of Analytics:

Just checked the data we got from today and it looks perfect. I looked at all users who viewed a wall and they also triggered sign in events.

The only change between it being broken and it being perfect is this:

image