LD remote api calls fail when beforeunload event has been triggered

edvinerikson commented 4 years ago

Describe the bug When LaunchDarkly attempts to open connections to the server api close to when a beforeunload event occurs, those connections / api calls fails in a unexpected way.

So far what we've seen this is mostly a issue on chromium browsers. We've seen the issue on browsers such as Chrome, Opera, Samsung mobile and other Chromium based browsers.

However, we do capture the bug on Safari/Webkit browsers too, just not at the same volume.

To reproduce TL;DR Trigger a beforeunload event and then try to refetch data from LD backend and it will fail. A minimal reproduction case: https://codesandbox.io/s/distracted-zhukovsky-uzljh

Story time.. Our main customer base is Swedish customers. In Sweden, we have a national authentication system (BankID) that any Swedish company can use as a mechanism to identify, verify and bind legal contracts with their customers.

This authentication system provides a consumer facing mobile app, this is where customers inputs their password (our app never deals with the password). Our mobile app integrates with this bankid mobile app as well as their server-side api. The flow works kind of like this:

We initiate a authentication process by executing a rest endpoint in their backend. Their backend returns back a UUID token. We forward this token to our mobile app. Our mobile app will then attempt to open the bankid mobile app using a custom scheme window.location = "bankid://initiate?token=<UUID>"

Since bankid:// is not a supported protocol in any browser or system, the browser will simply trigger a beforeunload event, but will never navigate away from the page.

However, the mobile OS (IOS, Android, etc) will intercept these calls and see if there are any native apps on the phone that has expressed interest in this specific protocol.

Since the bankid mobile app has expressed interest in this protocol, the OS will launch that app and forward the data /initiate?token=<UUID> to a handler within the app. The bankid app will then associate that token with the user that the bankid app has been associated with.

The bankid mobile app will then ask if the user agrees to sign in to the service that the token has been associated with (LeoVegas in our case). If there is a agreement, the bankid service will then mark the token as authenticated and our backend will consider the user authenticated. When all this is done, the bankid mobile app will issue a "go back" command which is going to take the user back to whatever app that initiated the authentication process. In our case that is our mobile app.

Once back in our app, we will run a call to our backend and fetch the user that got associated with the bankid token. When the user data is fetched, we use that data to construct a LDUser and pass that to identify. This call to identify fails.

Expected behavior The LD SDK should succeed fetched the data from the backend. Our own api calls work as expected so there is definitely something LD does in order to have these calls fail.

Logs Expected application/json content type but got ""

SDK version 2.16.0

OS/platform Android, Chromium Additional context I have a slight theory that using the native fetch method is what makes our own api calls succeed. To my knowledge, LaunchDarkly uses XHR. There could be a case where XHR calls fail if they are executed when the page has not been interacted with by a user and the beforeunload event has been triggered.

edvinerikson commented 4 years ago

Scanned through the code a bit just now. https://github.com/launchdarkly/js-client-sdk/blob/master/src/httpRequest.js#L18-L25 In particular emptyResult seems to be what the SDK returns in the calls I've managed to debug. So it might be the pageIsClosing condition that is triggered in our flow and not a browser bug closing the connections.

eli-darkly commented 4 years ago

It's possible that https://github.com/launchdarkly/js-client-sdk/pull/199 has resolved this, but I think it's not totally clear whether that is the same issue. Please let me know how things look with the latest release.

edvinerikson commented 4 years ago

The patch I did has had a very positive impact on this. So far the only errors we have tracked that I believe is related to this are from clients that has not yet updated the latest version. I have not discovered any errors originating from versions that includes the patch.

eli-darkly commented 4 years ago

@edvinerikson I didn't realize until yesterday that the support request I've been working on, involving some analytics stats that seemed to be out of line with the configured rollout percentages, was for your company. If I had, I would've asked you for some additional details that might've helped to clarify whether that issue is really the same as this one. Specifically: is the behavior you're describing, where the app navigates to a "bankid://" URL, something that only happens for certain users based on a feature flag— and if so, do you know whether that is the same large group of users that was being under-represented in the analytics?

If so, that semi-confirms a hunch we had, but I'm still a little confused about why the impact would be so large. The fix in https://github.com/launchdarkly/js-client-sdk/pull/199 was for a situation where the SDK would continue making all of its HTTP requests synchronously when it should have gone back to the default async behavior. That's desirable, but I would have still expected the unnecessary sync requests to work— I'm not sure why they would fail, causing identify() calls not to work and event data to be lost.

edvinerikson commented 4 years ago

We are not sure if the percentage rollout issue is caused by this. I am slightly suspecting that it might be as it's a fairly large user base that "lost connection" to launchdarkly.

The reason those calls fail is because there is a special case added in the httpRequest function that simply prevents any chromium browser above version 73 to make synchronous api calls. It's that condition that a majority of our user base got hit by.

https://github.com/launchdarkly/js-client-sdk/blob/master/src/httpRequest.js#L22

eli-darkly commented 4 years ago

@edvinerikson Ohhhhhhhh... yes, I see. The reason for that conditional behavior is that Chrome doesn't support sync HTTP when the page is closing (they expect you to use the Beacon API instead), and we were assuming that we would only be trying to make sync HTTP requests when the page is closing. That explains it. Thanks.

Note that this means our "try to send the events at the last minute if the page is closing" logic doesn't really accomplish anything in Chrome now; our goal is to use the Beacon API, but we had run into some snags with that. So the current behavior is a compromise.

edvinerikson commented 4 years ago

Would it make sense to attempt doing a asynchronous request in this scenario? IMO it's better than just aborting, but I am not sure what side-effects that could have.

eli-darkly commented 4 years ago

The possible side effect would probably just be that it wouldn't accomplish anything— the page would almost always close too fast for the request to actually happen. And you might get an error message in the console, which is probably not a problem because even if someone has monitoring set up that would cause annoying alerts when there's a console error, it probably would not get to send an alert for the same reason (the page closes too fast). So, yes, until we can get Beacon to work, it might make sense to fall back to async for this.

launchdarkly / js-client-sdk

LD remote api calls fail when beforeunload event has been triggered #198