whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.16k stars 2.69k forks source link

Session history entries list is assumed to be unlimited in size #8620

Open petervanderbeken opened 1 year ago

petervanderbeken commented 1 year ago

WebKit, Chromium and Gecko all have a limit on the size of the session history entry list, WebKit uses 100 and Chromium and Gecko use 50, and adding more entries will start dropping entries from the front of the list. I think the spec should at least mention that there might be a limit.

We're still trying to figure out how this affects the various algorithms, but for example step 6 in https://html.spec.whatwg.org/#url-and-history-update-steps simply sets length to index + 1, which is not what browsers currently do when that goes over the limit.

@jakearchibald @smaug----

jakearchibald commented 1 year ago

If I were to spec this, I'd do the reverse of clearing forward history, instead allowing back history to be cleared from a given step. That means history entries of the same step (like a page and it's initial iframe entries) would always be cleared together. Does that make sense?

Also, should we restrict when browsers can do this? As in, should it only be allowed when adding entries, or should we allow browsers to discard history entries whenever (eg when under memory pressure)?

jakearchibald commented 1 year ago

cc @domenic @domfarolino

smaug---- commented 1 year ago

I guess "step" means in this context the group/tree of entries. And yes, that is what at least Gecko does, the whole tree or trees is/are removed when purging session history from the beginning.

Firefox may also explicitly clear entries if one uses "Clear recent history...".

There are some interesting cases to consider, like https://html.spec.whatwg.org/#example-sync-navigation-steps-queue-jumping-basic What if one does: history.back(); for (let i = 0; i < 100; ++i) { location.hash = i; }

rd3k commented 1 year ago

Hello. I'm trying to follow along with the status of the 'Navigation API' (https://github.com/whatwg/html/pull/8502).

It looks like this issue is blocking Mozilla's 'Position' on the API? Please excuse my intrusion/naivety but I wondered if there was any external discussion/on-going work in this area?

domenic commented 1 year ago

This was discussed in the triage meetings linked above, e.g. https://github.com/whatwg/html/issues/8786#issuecomment-1424583835 back in January. Currently we're waiting for @smaug---- to work on his action item there:

Olli will suggest some changes on the https://github.com/whatwg/html/issues/8620 issue, Domenic can help to get them written up into a spec PR if necessary. Anne to check if WebKit has any opinions.

Olli: this is a longer term task. The spec algorithms will need changes and need to test first all the implementations. Panos will take it off the agenda for now, while Olli investigates.

It's not clear what, exactly, are the blockers for Mozilla on taking a position on the navigation API. I tried to get clarity in https://github.com/mozilla/standards-positions/issues/543#issuecomment-1504553555 but heard back "The review process is just a bit slow".

As I expressed in the meeting, as part of the navigation API work we (Chromium) first did #6315, which fixed 40+ open issues, due to requests from Mozilla and others that we first work on the tech debt in this area. We (Chromium) are hesitant to invest effort in fixing another issue, given that fixing 40+ issues was not enough. Evidence so far suggests that the payoff for fixing one more tech-debt issue will not be faster review, but instead requests for yet more tech-debt fixes. That's why we're hopeful the process can become more collaborative, with Olli or other Mozilla engineers collaborating per the above action item, instead of placing the burden entirely on Chromium engineers.

domfarolino commented 1 year ago

From a technical perspective, I'm actually not sure how serious the issue is here, but more investigation would make that clear one way or another. From the the triage meeting in January when this issue was first discussed, we were under the impression that the problem was no obvious behavior fell out from enforcing a limit on the session history. The two examples are:

  1. From the agenda:
    • Specific problem: pushState() x50, reload(), pushState() x50. Since reload is async, what happens? (Gecko ignores the reload.)

    • You don't need the first pushState() x 50 to observe the problem, so boiling it down we have a reload() followed by 50 x pushState()s
  2. @smaug----'s example above:
    • What if one does: history.back(); for (let i = 0; i < 100; ++i) { location.hash = i; }

In both of these cases, history.back() and location.reload() are the "asynchronous" navigations that, when on the session history traversal queue, immediately compute an absolute targetStep (number) based off of the delta (0, or −1). But by the time the asynchronous navigation actually continues past the synchronous navigations, with history limits we'd be targeting a step number that no longer exists.

The reason I'm not sure how bad this is is because even without session history limits it's already possible to target session history entries that no longer exist via removing iframes, so it is unclear how different targeting step numbers that don't exist is from that scenario (which is handled properly to my knowledge). There are definitely some small obvious things that break: first is the call to getting the history object length and index which takes a targetStep that I think would break this assertion, which expects the stale targetStep to exist in the list returned by get all used history steps which I don't think would be the case anymore. Beyond that I wonder if there is an easy way to graft our handling of the targeting-a-missing-history-entry case onto this one.