pendo-io / pendo-mobile-sdk

Pendo captures product usage data, gathers user feedback, and lets you communicate in-app to onboard, educate, and guide users to value
https://www.pendo.io
Other
57 stars 2 forks source link

Pendo iOS SDK API timeouts cause unbounded memory growth #174

Open johntmcintosh opened 2 weeks ago

johntmcintosh commented 2 weeks ago

Platform + Version iOS 17

SDK Version 3.3.1

Framework Native iOS, mostly UIKit with some SwiftUI

Describe the bug We observed a scenario that results in unbounded memory growth from the Pendo SDK when Pendo API calls timeout.

This problem initially manifest to us through reports of app terminations due to high memory usage. Here's a plot showing memory usage by the app over the course of an hour with and without Pendo's API calls timing out.

Screenshot 2024-09-04 at 2 48 46 PM

When the API calls timed out 100% of the time, we would see the app's memory growth climb from our typical 100 MB to 1.6 GB and then be terminated by the OS.

Our app is primarily used in restricted network environments. Although we are aware of the guidance on which hostnames should be allowed, one of our customers did not have Pendo hosts allowed, resulting in all API calls to Pendo being routed to the customer's internal network rather than to your API. These calls would eventually fail with timeouts.

In my testing, after launching the app with Pendo's API blocked by a proxy, starting a session, and triggering a few events, I observe Pendo getting stuck in an infinite attempt to get an access token. pendo-logs.txt has a snapshot of the logs that we see repeating every approx 30 seconds once this loop begins.

Our customers use the app continuously for 1-2 hours at a time, and during that time a significant number of screen change events are generated. My light investigation suggests that over time as more and more events are generated, each one in starting up more and more attempts to get an access token, and each attempt continues looping. Over time this eventually results in the memory growth shown in the graph above.

For our customers

To Reproduce Steps to reproduce the behavior:

  1. Configure a network proxy (I used Proxyman) to treat all calls to data.pendo.io with 100% packet loss.
  2. Enable PendoManager.shared.setDebugMode(true)
  3. Setup Pendo with an appKey
  4. Call pendoManager.startSession(...) with a user's details
  5. Run the app triggering screenContentChanged() as screens change.
  6. Stop interacting with the app and watch Pendo logs continue to repeat in the console.

Expected behavior We expect the SDK to be more resilient to failing network conditions and gracefully handle 100% packet loss without resulting in unbounded memory growth.

Logs pendo-logs.txt

Sample Code n/a

MikePendo commented 2 weeks ago

Hi @johntmcintosh, So just to be more clear I will try briefly explain how the pendo sdk works in terms of offline storage. Offline storage supports by default up to 10MB (FIFO)but can be configured to a different size on our BE (pendo mobile offline ). Then the SDK will send those analytics based on number of events OR time trigger OR immediate events. SDK should not start to collect analytics if it didnt get successful setup and start session and that based on your connection.

Your case is slightly different as the SDK detects available connectivity BUT when it tries to connect it get blocked by the host app. The error log is kinda explains it:

Error: '{
    URL = "https://data.pendo.io/v2/devices/getAccessTokenSigned";
    errorDescription = "No data received or bad JWT in access token request.";
    httpErrorDescription = "An SSL error has occurred and a secure connection to the server cannot be made.";
    httpErrorUnderlyingError = "The operation couldn\U2019t be completed. (kCFErrorDomainCFNetwork error -1200.)";
    httpStatusCode = 0;
    retries = 2;
}'

The issue occurs here httpStatusCode = 0 that kinda not really valid status opcode and unfortunately we dont properly handle it. As a result SDK starts to collect analytics and attempts to process it. Although it still should properly handle 10MB I noticed that in some cases the allocated memory wasn't released within the proper time and that where you see those huge peaks in the memory usage If its possible maybe you could share the sub id of the account it was crashing? (I will verify the configuration for offline analytics). Best practice would be:

We will try to release a fix that will handle the case of status code 0

johntmcintosh commented 2 days ago

Hi Mike, thanks for the detailed response.

I don't think we have had any issues with "normal" offline support for our other customers, but I believe our sub id is 6036724765753344.

MikePendo commented 2 days ago

@johntmcintosh Thanks for reply, in any case I will start to work on it soon to avoid those kind of issues, I believe it will be addressed in 3.4.2

sigsbeym commented 23 hours ago

We are seeing similar reports in our product, and it looks to be occurring for some users who are seeing 2-4 minute (yes minute) response times from pendo requests. The application deadlocks and then memory climbs until springboard kills the process.

MikePendo commented 10 hours ago

@sigsbeym @johntmcintosh We will try to address it next week with HF. Sorry about that @sigsbeym Maybe you could provide apple crash log just to verify its the same issue

sigsbeym commented 9 hours ago

@MikePendo I'll see if I can get one, but the memory pressure deadlocks weren't producing any stack traces yesterday when I was looking into this, either in our crashlytics tooling or in the TestFlight crash reporting.

It looks like there is a fix in 3.4.0 for a memory pressure issue, so I'm going to try upgrading to that today to see if this fixes the issue we are facing.

sigsbeym commented 8 hours ago

Update: upgrading to 3.4.0 did not resolve the issue for our affected users.