smogg / auth-mutli-tab-issue

1 stars 1 forks source link

Is this an issue with browser-tabs-lock? #3

Open rishabhpoddar opened 3 years ago

rishabhpoddar commented 3 years ago

Hey, I'm the owner of the browser-tabs-lock library which is used by Auth0 for synchronising calls to the refresh API across tabs. The library is very well tested, but as mentioned in its repo, it's impossible to have actual locking semantics in a browser just using JS.

Do you have any ideas on how the lib can be further improved? Perhaps we can try and fix this issue within the lib itself since Auth0 doesn't seem to be doing much about it.

smogg commented 3 years ago

Thanks @rishabhpoddar. I updated the master branch with minimalistic scenario that points to the locks not being obtained properly between tabs. I think we are still seeing this as the main source of trouble. If you have any pointers for how to improve this non-trivial problem that'd be awesome. I'm happy to jump on a call and discuss it in more detail from the perspective of browser-tabs-lock library 🙇‍♂️


To add to the problem - I can see 403 even in a single tab. Happens rarely, but is reliably reproducible by just opening a single tab and expiring the token every 20 seconds.

rishabhpoddar commented 3 years ago

Found the issue! It's to do with how Auth0 has used my library. They acquire the lock as follows:

//...
try {
   await lock.acquireLock(GET_TOKEN_SILENTLY_LOCK_KEY, 5000)
   // assume that the lock has been acquired

   //...
} finally {
   await lock.releaseLock(GET_TOKEN_SILENTLY_LOCK_KEY);
}

However, as I had mentioned in the readme for my lib, acquireLock returns a Promise<boolean>, and only if it returns true, should the lock be considered acquired. So the code should be:

//...
while(true) {
   if (await lock.acquireLock(GET_TOKEN_SILENTLY_LOCK_KEY, 5000)) {
      try {
         // assume that the lock has been acquired

         //...
         return;
      } finally {
         await lock.releaseLock(GET_TOKEN_SILENTLY_LOCK_KEY);
      }
   }
}

I cloned their auth0-spa-js repo and made the above change, built it, replaced the <script> src with the newly built version, and even after opening 9 tabs for 10 mins, and clicking on "Start expiring button", there was no issue!

Hope this helps :)

rishabhpoddar commented 3 years ago

I have also submitted a PR to auth0-spa-js with a fix for this: https://github.com/auth0/auth0-spa-js/pull/658

smogg commented 3 years ago

@rishabhpoddar Greatly appreciated! I'm verifying the fix now <3

rishabhpoddar commented 3 years ago

While this fixes the issue you are facing, some of the tests they have written for their SDK fail: https://app.circleci.com/pipelines/github/auth0/auth0-spa-js/1465/workflows/1b7e0a38-e6ed-4afe-ad59-82ae152081a4/jobs/1882

I'm not sure why though.. But they can figure it out.

smogg commented 3 years ago

@rishabhpoddar Thanks. I was still able to produce the error with a lot of tabs open, so I don't think it goes away entirely. I'll revisit this tomorrow and verify again with a fresher mind

rishabhpoddar commented 3 years ago

That's odd. I just ran a test with 25 open tabs for 2.5 mins, and no issues: https://www.loom.com/share/f60d453e9f16490aad9584f693ceacfe

The fix in the auth0-spa-js SDK is here (Forked version).

Could you please elaborate on how you managed to reproduce the issue?

smogg commented 3 years ago

@rishabhpoddar I made the conditions more extreme by firing getTokenSilently every 100ms and opened a lot of tabs for a long time (2.5m might not do it). If you're interested, I can try to reproduce more reliably a bit later in the week. The change you proposed greatly improves things and I'm hoping will be rare occurrence (if at all) under real-life circumstances.

rishabhpoddar commented 3 years ago

Yea. Makes sense. The locking mechanism isn't perfect. It's impossible to get actual locking semantics without the browsers providing suitable APIs. But for most real life cases, this should be good to go! If you can reliably reproduce the issue, then I can tweak the code in the lib to become even more robust.

Thanks!

smogg commented 3 years ago

@rishabhpoddar actually, we see the problem even with a few tabs open - it's just less common. I'm able to reproduce even with 6 tabs open for ~hour with the same timing settings.

rishabhpoddar commented 3 years ago

Wow.. okay. Any suggestions on how this occurance could be made even more rare?

I'll try and make it more robust when I have time.

smogg commented 3 years ago

@rishabhpoddar we are currently exploring different locking approaches. Also, I'm testing an option with higher timeoutInSeconds set in Auth0 to make sure it's not a problem with lock released too early. Will continue to explore more and will keep this thread up to date.

rishabhpoddar commented 3 years ago

I saw the Lamport lock algorithm, however, I noticed that that too has failure cases where in very high loads, it may break. Referencing the flow chart in the link you had sent above, let me describe a scenario where two tabs (tab1, tab2) can both think that they have locks (ID of tab1 is t1, and ID tab2 is t2):

time0: A = t1 time1: A = t2 time2: both tabs check that B is free, and continue time3: B = t2 time4: B = t1 time5: tab2 checks that A == t2 -> true. So it has acquire the lock and does work till time11 time6: tab1 checks that A == t1 -> false. So it waits... ... time9: tab1 checks that B == t1 -> true. So it has acquire the lock -> both tab1 and tab2 have acquired the lock.


So this approach too may lead to the error you are facing. But it may make it more rare, or more frequent.. we will need to experiment