mswjs / msw

Industry standard API mocking for JavaScript.
https://mswjs.io
MIT License
15.97k stars 519 forks source link

Request made immediately after calling stop on the service worker doesn't always passthrough properly #2323

Open spocke opened 1 month ago

spocke commented 1 month ago

Prerequisites

Environment check

Browsers

Chromium (Chrome, Brave, etc.), Safari

Reproduction repository

https://github.com/spocke/mswjs-stop-bug

Reproduction steps

  1. Clone the attached repo
  2. Install the npm dependencies using npm i
  3. Start the server using npm start
  4. Open the Safari or Chrome and go to go to http://localhost:3000
  5. Re-load the page N times might take a few tries
  6. Observe that sometimes the second image is not served (network tabs shows it in a pending state)
  7. Observe that sometimes the second image is served though the service worker but not properly served it just spins and spins forever in the network tab

Current behavior

Resources are still served though an active client rather than passed though the service worker even if you call worker.stop() sometimes that even fails to serve the request since likely because of the handlers being removed for it on the parent page side. It's likely flakey like this because the post message from the browser to the service worker takes a while to process so it can't be a fire and forget it needs to be async and send a ACK back when it has properly intercepted the message to stop intercepting requests.

Expected behavior

No requests should be processed though an active client on the service worker once you call stop on it. The stop function probably needs to be async to know when it's safe to do new requests and expect them to be passed though by https://github.com/mswjs/msw/blob/ca6cb7e8e95ad76c6fe6c44d7f4b80d36bc53478/src/mockServiceWorker.js#L109.

kettanaito commented 1 month ago

Hi, @spocke. Thanks for reporting this.

What worker.stop() does

Note that worker.stop() doesn't mean unregistering the worker. I've tried to highlight that in the docs as well:

Screenshot 2024-10-21 at 17 31 11

Your site's resources will be processed by the worker because it remains registered and active. The worker.stop() call tells the worker to not include your request handlers during that processing. It's also client-scoped, and you can have different tabs of the same app having the interception enabled and disabled at the same time.

Once you called worker.stop(), the current client's ID is removed from the internal list of clients in the worker. Any requests originating from this client are ignored (i.e. processed as-is).

But why not unregister the worker?

Registering the worker is a costly action. That's why MSW automatically unregisters the worker once you close the last active controlled client (i.e. the last tab of your app). Due to how Service Workers operate, there is no difference between having no worker and having a worker that bypasses all the network.

But what if I need to unregister the worker?

Then you can do so via:

await navigator.serviceWorker.controller.getRegistration((registration) => {
  return registration.unregister()
})

I see little reason to do that, however. Debugging may be the only reason I can think of.

kettanaito commented 1 month ago

Given the context above, can you rephrase the actual/expected behaviors so it's more clear?

spocke commented 1 month ago

Sorry for any confusion here but I'm not talking about unregister the service worker. From my understanding it remains registered but goes into an inactive state so that all requests that gets served though the worker just passes though. However the issue here is that once you post a message to the worker the stop processing requests that code assumes that the postMessage to the worker is instant that is not the case it takes a few milliseconds then it removes any local handlers for it. So it's in this limbo state for a while. So the workaround I have is to just issue a integrity check to the service worked since that is a request type of action so I can know that the service worker has received and processed the request I know that it also processed the previous in flight stop fire and forget call. The other workaround would be to call stop then wait for N time until the message has been processed by the worker but how long do you wait it's up to the browser to schedule post messages so that might take random time to finish so not ideal.

spocke commented 1 month ago

So from my understanding it does this: https://github.com/mswjs/msw/blob/020161ef720840efd8742fe9e49326abf54f0114/src/browser/setupWorker/setupWorker.ts#L185 So unregisters all local handlers and postMessage a MOCK_DEACTIVATE https://github.com/mswjs/msw/blob/ca6cb7e8e95ad76c6fe6c44d7f4b80d36bc53478/src/mockServiceWorker.js#L70 But since it's sync this will now be in a limbo state where the client has not been removed by the service worker yet but all the handlers have been removed on the parent page so there is nothing that can serve the requests anymore until the message has been processed by the worker and the client is properly removed then it goes into the pass though state in https://github.com/mswjs/msw/blob/ca6cb7e8e95ad76c6fe6c44d7f4b80d36bc53478/src/mockServiceWorker.js#L109 where it has no clients.

kettanaito commented 1 month ago

Thanks for a more detailed explanation. I think I got it now.

So the issue is the window of an unexpected behavior between calling worker.stop() and the client ID actually being removed from the internal worker state.

Making worker.stop() return a Promise will be a breaking change. Instead, I suggest checking worker.context.isMockingEnabled in the request listener on the client:

https://github.com/mswjs/msw/blob/020161ef720840efd8742fe9e49326abf54f0114/src/browser/setupWorker/start/createRequestListener.ts#L17

Basically, if we arrive at that problematic window, and the worker sends a request to the client while removing its ID from the set, the client can ignore that request event if isMockingEnabled is false.

It would be nice to design a reliable test case for this.

kettanaito commented 1 month ago

The thing with that window is that even if a request happens within it, the client doesn't remove any handlers. It will still process that request. The only layer that is affected by worker.stop() is this check:

https://github.com/mswjs/msw/blob/ca6cb7e8e95ad76c6fe6c44d7f4b80d36bc53478/src/mockServiceWorker.js#L109

But this check happens in response to any request immediately. So upon an intercepted request, there's either some client to handle it, or not. If there's a client, the worker will message it and get the result back (no matter if worker.stop() has somehow happened in parallel). If there isn't any clients, the request is ignored.

Perhaps there's something else to this issue I'm missing.

spocke commented 1 month ago

I'm pretty new to this project but the bug is very easy to reproduce with the repo I made. With that it clear that requests are not processed properly by the service worker after calling stop on it.

I see no error messages when it's not processing the request it's just spinning with pending so it's like the client is still connected but it's waiting for a response from the parent page to process the request and that never happens so it just hangs.

So here is a screenshot of a failed session the second request just spins forever: image

It's pretty basic it loads two images one while the mocking is active and one after the stop call has been made. The second request hangs sometimes especially on Safari but happens on Chrome as well.

kettanaito commented 6 days ago

@spocke, can you trace what's going on if you put a debugger inside the fetch listener in mockServiceWorker.js? Please share how far you get and whether something will seem suspicious during the second, problematic image resolution.

spocke commented 15 hours ago

Ok so I added logging in the message event handler function when the fetch event handler is called and before and after the getResponse call in handle request I also logging when the parent page/client is sending the MOCK_DEACTIVATE however only showing that for the Chrome logs since they share the same console output it easier to spot the difference.

So from that I can see that the failing ones have the fetch call in flight while the MOCK_DEACTIVATE call is made to the service worker but not yet received. So I think my assumption about the race condition was correct, the stop needs to be async and wait for the message to be fully processed by the worker until one can assume that it's properly switched off or some other workaround for unhandled in flight requests needs to be added.

Other observations Safari logs DataCloneError not sure what that is about. Also I noticed that the worker.context.requests map is not fully cleared on Safari for some reason so might be a memory leak there?

Failed Safari: image

Passed SafarI: image

Passed Chrome: image

Failed Chrome: image