pubnub / javascript

PubNub JavaScript SDK docs https://www.pubnub.com/docs/sdks/javascript
Other
553 stars 401 forks source link

Runaway GC in Safari\Webkit on refresh without disconnect() #398

Open WalrusSoup opened 3 months ago

WalrusSoup commented 3 months ago

Running into an issue where a simple pubnub chat causes Safari to eat up 90% CPU when any channels have been joined. I've isolated it to opening a connection to channels. As soon as I call connect, it will have this issue after refreshing. It wont happen every refresh, but it happens 90% of the time.

Safari Version: Version 17.5 (19618.2.12.11.6)

Prior to refreshing, a lot of errors pop up in the console (~10 times): Fetch API cannot load [x] due to access control checks.

This occurs even in a simple boilerplate page with nothing on it. This is from a vanilla vite project compilation, with an empty page:

Chat.init({
  publishKey: import.meta.env.VITE_PUBNUB_PUB_KEY,
  subscribeKey: import.meta.env.VITE_PUBNUB_SUB_KEY,
  userId: "1111111",
  storeUserActivityTimestamps: true,
  }).then((chat) => {
  const channel = await chat.getChannel("50")
  // connect or join
  channel.join((msg) => {
    console.log(msg);
  });
});

I do not see the errors Fetch API cannot load [x] due to access control checks when reloading via the developer console.

https://github.com/user-attachments/assets/c53f290e-e7d7-4a19-94db-1d357d112b4f

If you force the browser to disconnect before reloading, we don't see the same behavior:

window.onbeforeunload = () => {
  console.log('unloading!!!!', window.chatRef);
  window.chatRef.sdk.disconnect();
}

On the development side, since we're in a Webkit view, i cannot seem to fix it for a reload. But i can at least prevent it from happening during hot module reloads:

import.meta.hot.on('vite:beforeFullReload', () => {
  if(window.chat) {
    console.log('!!!!UNLOADING CHAT!!!!');
    window.chat.sdk.disconnect();
    console.log('!!!!SDK DISCONNECTED!!!!');
  }
});

However the webkit environment does not fire the events onbeforeunload etc when refreshing which means this bug is basically stuck.

On reload, it appears that the timers that were scheduled are simply running forever. I set up a setTimeout() to simply disconnect Pubnub and set the object to undefined and the timers still hang out forever.

If I load, then reload the window and call destroy() on the current instance:

  setTimeout(() => {
    console.log('Calling Destroy()');
    window.chatRef.sdk.destroy();
  }, 5000)

When we compare snapshots we see that Safari seems to be hanging onto the SDK and throwing partial GCs hundreds of times:

Screenshot 2024-08-02 at 9 43 24 AM

Likewise, when calling destroy() on the SDK underneath, we still hang onto memory because there is no destroy() function on the ChatSDK. I can manually clean up both and THAT appears to fix the issue. However, this does not work when reloading. This must be called before we reload the window.

  setTimeout(() => {
    console.log('Calling Destroy()');
    window.chatRef.sdk.destroy();
    delete window.chatRef;
  }, 5000)

Perhaps I am missing something here?

parfeon commented 3 months ago

@WalrusSoup thank you for sharing information about this issue. It looks like the WKWebView way of handling JS context is the reason for what is happening. There are two ways to solve it:

  1. Initialize new WKWebView instance: this will force JS contexts to be destroyed
  2. The one which you mentioned: window.onbeforeunload - this can be done differently: a. The user handles this case and destroys the current SDK client / Chat SDK instance b. If the web version of JS SDK used, it can add onbeforeunload handler and destroy itself.

For 2.b. I will need to speak with the team about whether it will affect anyone if we will destroy by default.

WalrusSoup commented 3 months ago

WKWebView does not seem to reliably trigger onbeforeunload. Should this be a bug report to webkit then since it's so easily and reliably recreated?

I managed to get it functional by hitting it from all sides in our app - but jeeze. This doesn't feel great.

if (import.meta.hot && !window.importHook) {
  import.meta.hot.on('vite:beforeFullReload', () => {
    if (window.chat) {
      // @ts-ignore
      window.chat.sdk.destroy();
    }
  });
  window.importHook = true;
}

reload via hotkey:

if (!window.refreshHook) {
  document.addEventListener('keydown', async (e) => {
    e = e || window.event;
    if (e.keyCode == 116) {
      window.chat.sdk.destroy();
      window.location.reload();
    }
  });
}

context menu needs to be replaced with one that can hook reload, so we disabled it.

document.addEventListener('contextmenu', function (e) {
    e.preventDefault();
});
fwebdev commented 4 weeks ago

Is there any progress on this?

Our Customers also run into this Issue. The Regression was introduced in Version 8.2.7. When I look at the Diff there is not much that changed. How could that cause the Error?

We had to rollback because of Customer Complaints.

tance77 commented 1 week ago

@parfeon Any updates on this?

parfeon commented 4 days ago

@WalrusSoup as I mentioned on #396 that because of suggested fix (when browser forcefully closes connection) Web Fetch API doesn't provide much information about error, and we can only assume using error message (in most of the cases). To properly handle this case, subscription timeout should be configured for PubNub instance, so it will recreate idle request on its own before it will be forcefully terminated by the browser.

Fixes from #396 have been reverted in 8.2.9 and there is even newer version available already: 8.3.0

WalrusSoup commented 3 days ago

Our Fix: We've patched this in our own project by listening for a Tauri navigation event. For browsers, right now, i would suggest listening to both pagehide and beforeunload and calling chat.sdk.destroy(true) or pubnub.destroy(true)

window.addEventListener('beforeunload', () => {
// yeet the sdk
});
window.addEventListener('pagehide', (event) => {
// yeet the sdk
});

The runaway GC still remains an issue as soon as any channel connection is opened. I suspect that we're looking at an upstream node-fetch issue.

What's even crazier is you can open the connection and refresh the page without connecting to pubnub and the page still goes into a GC spiral.

Screenshot 2024-11-18 at 4 06 39 PM

I loaded it the first time, commented out the pubnub items for the second run, then gave it a refresh. Safari still confidently holding on to a ton of functions from within the sdk.

Screenshot 2024-11-18 at 4 10 31 PM

And again it's still pretty trivial to reproduce:

import { Chat } from "../js-chat/lib/src/"

Chat.init({
    publishKey: import.meta.env.VITE_PUBNUB_PUB_KEY,
    subscribeKey: import.meta.env.VITE_PUBNUB_SUB_KEY,
    userId: "user_1111",
    subscribeRequestTimeout: 59,
}).then(async (chat) => {
  console.log('chat is init. im gonna refresh now i guess');
  const currentTimestamp = new Date().getTime();
  const secondUser = await chat.createUser(`user_${currentTimestamp}`, {
    name: `User ${currentTimestamp}`,
    profileUrl: "https://www.example.com/profile.png"
  });
  const { channel } = await chat.createDirectConversation({
    user: secondUser,
    channelId: `direct-user-${secondUser.id}-${currentTimestamp}`
  });
  channel.join((msg) => {
    console.log(msg);
  })
});

Also would like to point out - this bug will trip also from any navigation event away from the page when the back button is used.