ukwa / w3act

w3act is an annotation and curation tool for building web archive collections
Apache License 2.0
19 stars 6 forks source link

W3ACT sessions expire in Firefox/Chrome #662

Closed crarugal closed 2 years ago

crarugal commented 2 years ago

Since the latest roll-out, I've been noticing a lot of sessions that expire, which wasn't the case before.

This is the message that will pop-up, soon after browsing in QA ACT: https://www.webarchive.org.uk/act/wayback/archive/20210413112946/https://collectivewisdomproject.org.uk/about/ image

I've tested the issue in three set-ups

  1. Chrome (not private mode) = issue seems to affect Chrome
  2. Firefox (private mode) = seems fine, sessions don't expire
  3. Firefox (normal mode) = issue also seems to be affecting Firefox

After the session times out, and you're asked to log back in, this happens: https://www.webarchive.org.uk/act/wayback/archive/20210413112946/https://collectivewisdomproject.org.uk/about/ image

Testing Firefox (normal mode, not private)............................. sessions expire after 1 minute, server is GMT image

  1. visit target record: https://www.webarchive.org.uk/act/targets/29155 image

  2. find instances: https://www.webarchive.org.uk/act/wayback/archive/*/https://www.scottishpower.co.uk/ image

  3. click on instance(2018 instance): https://www.webarchive.org.uk/act/wayback/archive/20180401020323/https://www.scottishpower.co.uk/ image

  4. click on a link: https://www.webarchive.org.uk/act/wayback/archive/20180401020323mp_/https://www.scottishpower.co.uk/cancer-research-uk/ image

  5. Looking at another instance(2021 instance), without re-logging into ACT: https://www.webarchive.org.uk/act/wayback/archive/20210501104047/https://www.scottishpower.co.uk/

image

6.visiting the same 2021 instance in Chrome (not private mode): https://www.webarchive.org.uk/act/wayback/archive/20210501104047/https://www.scottishpower.co.uk/ image

7.Chrome prompts login image

  1. Going back to Firefox (normal mode), logging back into ACT and visiting the same 2021 instance: https://www.webarchive.org.uk/act/wayback/archive/20210501104047/https://www.scottishpower.co.uk/ image

Testing Firefox (private mode), ...session expires after session, no time limit image

  1. logging into ACT image

  2. navigating to target: https://www.webarchive.org.uk/act/targets/29155 image

3: visiting the same 2021 instance: https://www.webarchive.org.uk/act/wayback/archive/20210501104047/https://www.scottishpower.co.uk/ image

  1. trying a different 2021 instance in Firefox (private mode): https://www.webarchive.org.uk/act/wayback/archive/20210927080230/https://www.scottishpower.co.uk/ image

  2. trying the same 2021 instance as above, but in Firefox (normal mode): https://www.webarchive.org.uk/act/wayback/archive/20210927080230/https://www.scottishpower.co.uk/ image

Possible issues:

Firefox (private mode): last time cookies accessed, seems ok, but showing GMT, not BST. Could this hour discrepancy cause sessions to end within 1 minute? image

2. Sessions seems not to time out in private/incognito mode, only in normal viewing mode, for both Chrome and Firefox . The page also renders better in Firefox private mode (compared to normal mode), maybe because cookies for resources aren't expiring quickly?

anjackson commented 2 years ago

This is very odd.

From my investigations, it doesn't seem to be anything to do with time. I can keep browsing around for a while, and leaving the session and then continuing also works, unless I go to a particular page - that Scottish Power homepage. At some point, for reasons I don't understand, the PLAY_SESSION cookie that authenticates the user gets dropped.

This seems to be correlated with loading https://www.webarchive.org.uk/act/wayback/archive/20180401020323mp_/https://www.scottishpower.co.uk/ which looks a bit like this:

2021-10-06-cookie-jsessionid

It's hard to tell what's going on, because so much happens concurrently in the browser, but it seems like the original website is setting a JSESSIONID and for some very odd reason I don't understand, this is causing the PLAY_SESSION cookie to get invalidated.

The only other thing I could think of was some kind of cookie overload, i.e. does your normal browser session have lots of cookies associated with archived websites and is this causing some kind of blockage. e.g. if you clear all cookies for www.webarchive.org.uk does it seem to work better? Is that the aspect of Private Mode that is helping?

One of the issues I've seen is that sometimes the cookies gets lost after a page has loaded, and you don't notice anything is wrong until you try to go to a new page. This makes working out what's happening more difficult, and I think that's why it seems to behave like a timeout.

Anyway, before going to far down this route, it'd be good to verify whether you've seen this for other archived websites? If it's hitting other sites that'll help triangulate what's going on.

anjackson commented 2 years ago

Ah, I have a hypothesis.

There's a limit to how many cookies are allowed per site. It's about 150 or so these days, apparently.

The way pywb handles cookies means we end up with a lot of them - cookies for every archived site are still replayed, because some are needed to get playback right. They are stored under a Path associated with each website, so they don't interfere with each other, but the browser still has to remember them all.

Possibly, browsing around is creating so many cookies that the critical PLAY_SESSION one is falling off the end, so to speak, and being forgotten. Although note that this is me assuming that's what happens - I've not proven that.

Hmm, the problem with this is I don't know why incognito/not would matter. There should need to be 150-180 new cookies minted after PLAY_SESSION is minted for this failure more to kick in. i.e. it's a bit like a countdown after logging in, and whether there are older cookies or not should not matter, I think?

crarugal commented 2 years ago

Thanks for looking into Andy, I'm not sure what could be causing it too, but I have come across the issue when browsing other archived instances across different domains. I'll also clear my cookies and see if that helps. In the meantime, I'll monitor it and note down anything that stands out.

It's not too much an issue, as I can still do my work; I wasn't sure if other ACT users are also being affected.

Webarchive cookies: image

crarugal commented 2 years ago

Using Firefox private mode:

Trying to access: https://www.webarchive.org.uk/act/wayback/archive/20211001100547/https://www.unrefugees.org.uk/ image

Session times out visiting link: https://www.webarchive.org.uk/act/wayback/archive/20211001100547mp_/https://www.unrefugees.org.uk/learn-more/news-and-stories/ image

I log back into ACT after being prompted: image

Cookies for https://www.unrefugees.org.uk/ are present image

I then try to visit the same instance: https://www.webarchive.org.uk/act/wayback/archive/20211001100547/https://www.unrefugees.org.uk/ image

Cookies for https://www.unrefugees.org.uk/ have disappeared: image

crarugal commented 2 years ago

Closed Firefox private, re-opened Firefox private so everything was flushed https://www.webarchive.org.uk/act/wayback/archive/20211001100547/https://www.unrefugees.org.uk/ image

Visited the link that logged me out before, works fine now: https://www.webarchive.org.uk/act/wayback/archive/20211001101313/https://www.unrefugees.org.uk/learn-more/news-and-stories/ image

anjackson commented 2 years ago

Hey @ikreymer if you get a chance could you take a look at this and see if you think we're on the right track? Unfortunately, it's going to be hard to test as this is specifically about running behind an authenticated service.

anjackson commented 2 years ago

Hi @crarugal, on DEV I've modified QA Wayback so the authenication cookie is returned as if it was a new cookie with every single response. I'm hoping this means the browser will consider it 'fresh' and not discard it. Please try visiting https://dev.webarchive.org.uk/act/wayback/archive/*/https://www.scottishpower.co.uk/ etc. and see if it seems better...

crarugal commented 2 years ago

thanks @anjackson, I'll test it out later today and report back

crarugal commented 2 years ago

@anjackson I've been testing it on different instances and internal links; looks like the fix works, as I've not encountered any sessions that expire prematurely. Many thanks for looking in to it and for the solution

anjackson commented 2 years ago

This is fixed, pending rollout.