Cache, state, lifecycle events, and the future of Turbolinks 3

Thibaut commented 9 years ago

While working on the docs I realized that the partial replacement feature currently in master introduced significant issues with page caching and lifecycle events. Below is a summary of the problem and a list of possible solutions.

I apologize for the long description. This is a defining issue for Turbolinks 3 that needs careful consideration.

Problem

Say I have this:

<body>
  <textarea data-rte></textarea>
</body>

$(document).on 'ready page:load', ->
  $('[data-rte]').each ->
    fn = -> # some function
    $(this).wrap('<div/>').on('click', fn)

After the page loads, the resulting markup is this:

<body>
  <div>
    <textarea data-rte></textarea>
  </div>
</body>

Now say I:

type "foo" in the textarea
visit another page
click the back button

Behavior in v2.5:

the <body> element is stored in the cache; its state (event listener and textarea value) is kept in memory
on popstate, the previous <body> is reassigned to the document, and:
- wrapping <div> is present
- fn is still attached
- textarea value is still "foo"
(this actually leads to a memory leak, because when the <body> is evicted from the cache, jQuery stills hold a reference to fn)

Behavior in master:

the <body> element's outerHTML is stored in the cache; its state is cleaned up
on popstate, the previous <body> is parsed and reassigned to the document, and:
- wrapping <div> is present
- fn is not attached to click event
- textarea value is blank (because the value of form elements is stored in memory, not in the DOM, which only stores their initial / hard-coded value)
(no memory leak)

(Side note: the reason why we now serialize the <body> element instead of keeping it in memory is that it isn't guaranteed to be replaced on Turbolinks.visit anymore. If we didn't and you did Turbolinks.visit("#{currentUrl}?q=test", change: ['container']), the container element would be replaced and we'd have no way to bring the old one back unless we somehow managed to keep track of the diff across pages + the old nodes in memory.)

Now, one way to get around this problem might be to bind to page:change (which also fires on history back/forward) instead of page:load, but then here's what happens after popstate:

new fn is attached
wrapping <div> is wrapped in another <div>
textarea value is still lost

The DOM transformation is applied a second time.

So then we might do something like this:

$(document).on 'ready page:load', ->
  $('[data-rte]').each ->
    $(this).wrap('<div/>')

$(document).on 'page:change', ->
  $('[data-rte]').each ->
    fn = ->
    $(this).on('click', fn)

Ignoring the fact that this isn't how people write JavaScript, here comes another problem: say I partial-replace another element on the page (with Turbolinks.visit(url, change: ['id'])):

both page:load and page:change are triggered on the document
the callbacks add a duplicate <div> and fn callback to the existing data-rte element

To alleviate this problem, #537 started passing the affected nodes to these two events, allowing you to do this:

$(document).on 'ready page:load', (event) ->
  event.data.forEach (node) ->
    $('[data-rte]', node).each ->
      $(this).wrap('<div/>')

$(document).on 'page:change', ->
  event.data.forEach (node) ->
    $('[data-rte]', node).each ->
      fn = ->
      $(this).on('click', fn)

... which, in practice, is impossible to pull off (race conditions).

tl;dr: current master will break everyone's back button unless they write insanely complex JavaScript.

Paths forward

(1) Get rid of the cache entirely

This is what we do at Shopify / Turbograft.

Upsides:

greater simplicity
zero breakage
no breaking change

Downsides:

slower back/forward navigation
loses state

Since a new page is loaded on history back/forward, we can keep using page:load the same way we would with Turbolinks 2.5 (no need to split DOM transformations / event listeners in page:load and page:update). For plugins that aren't idempotent, Turbograft also attaches the affected nodes to the page:load event (since it fires on partial replacement), so you'd end up with this:

$(document).on 'ready page:load', (event) ->
  $('foo').each -> $(this).idemPotentPlugin()
  event.data.forEach (node) ->
    $('bar', node).each -> $(this).nonIdemPotentPlugin()

To help with the "loses state" issue, we could keep the values of form elements in memory and try to re-apply them on back/forward (each element would need a unique id).

(2) Cache before executing JS

Like (1) but we cache the untouched <body>'s outerHTML of each page (before any JS is executed on it), like an HTTP cache.

Upsides:

less breakage
faster back/forward navigation than (1)

Downsides:

there might be some issues going back to initial pages if code transforms the DOM before we get a chance to cache it (for XHRs it's easy since we control the response)
the server would need to render the full page all the time. Doing render :view, change: :key, layout: false and updating the URL at the same time wouldn't be possible since we can't make an untouched <body> out of that (this was a requested feature that I was planning to address by making render :view, change: :key update the current URL on GET requests; the use-case being that of a search form which updates a container and the URL at the same time)
loses state

(3) Cache after executing JS (current master)

To make my example work without split callbacks, we'd need to write the code like this:

$(document).on 'ready page:load', ->
  $('[data-rte]').each ->
    return if this.loaded
    this.loaded = true
    fn = ->
    $(this).on('click', fn)
    $(this).wrap('<div/>') if parentNotADiv()

... and make one breaking change in Turbolinks: fire page:load on history back/forward.

Upsides:

keep new feature

Downsides:

breaking change
loses some state
unrealistic — all page:load callbacks would need to be both idempotent and able to re-apply themselves on an already-transformed DOM

(Side note: if we go with (1), (2) or (3), I would drop the transition cache, since its speed benefit is significantly reduced by the fact that we have to re-instantiate all the state on the page before loading the real new page.)

(4) Cache nodes and "reverse-apply" partial replacements

Like I briefly explained above, one solution might be to keep the changed nodes in memory (like Turbolinks 2.5 except it wouldn't always be the <body>). So for example when this happens: Turbolinks.visit(url, change: ['container']), we would keep the previous container elements in memory and put them back when you hit back.

Upsides:

not a big breaking change (same JS as in (1))
less breakage

Downsides:

unproven and potentially complex solution

(5) Drop partial replacement, keep Turbograft separate

If neither (1) or (2) are an option, and nobody is up for exploring (4), I would go with this, since I don't think (3) is acceptable.

Again sorry for the wall of text. Please let me know what you think.

cc @dhh @reed @rafaelfranca @kristianpd @pushrax

reed commented 9 years ago

Thanks for laying it all out for us to discuss, Thibaut.

In my opinion, (1) and (2) aren't good options because of the loss of state. We've always made it a point of emphasis that Turbolinks should mimic native browser behavior in every way possible. Losing the state when navigating through history would be a big contradiction to that.

When the change was made to cache body.outerHTML instead of body, I understood the reasoning behind it, but I wasn't thrilled with the fact that it meant that event bindings would have to be reapplied on cached pages. I had assumed (without thinking about it that much) that I'd be able to just bind my initialization JS to page:change instead of ready page:load and it'd be alright, but Thibaut has illustrated why it's not that simple.

So I would favor any solution that included reverting to the old way of caching pages. If it can be done, (4) would be the ideal option.

(I personally wouldn't have a huge problem with (5), except that it's probably not an option since it's already been announced/promoted and people seem enthusiastic about it. I think one of the reasons we resisted the idea of partial replacements for so long is because in the back of our minds we saw the potential of running into a caching issue like this. Hopefully we can find a way to make it work, but if we can't, dropping partial replacement might be our best move.)

Thibaut commented 9 years ago

Thanks for your thoughts, Nick.

I can give (4) a shot in the coming weeks. If we can pull it off (I only thought of it while writing this issue), it'd give us the best of both worlds.

Regarding (5), we'd still get to keep half the features announced at RailsConf (data-turbolinks-permanent).

dhh commented 9 years ago

Thanks for digging into this so clearly. I think it's worth taking a shot at 4, but for me, the most important feature to keep is data-turbolinks-permanent. Can't live without that any more, but could give without the other partial replacements (although I'd prefer not to!).

On Mon, Jun 1, 2015 at 1:34 PM, Thibaut Courouble notifications@github.com wrote:

Thanks for your thoughts, Nick.

I can give (4) a shot in the coming weeks. If we can pull it off (I only thought of it while writing this issue), it'd give us the best of both worlds.

Regarding (5), we'd still get to keep half the features announced at RailsConf (data-turbolinks-permanent).

— Reply to this email directly or view it on GitHub https://github.com/rails/turbolinks/issues/551#issuecomment-107408881.

kristianpd commented 9 years ago

4 sounds achievable up front but I'm worried it may dramatically increase the complexity of our replacement logic and still have gotchas.

I think the trickery will be around data-turbolinks-permanent and data-turbolinks-temporary behaviours in reverse as well as the difference between change and keep.

Page A partial replace with Page B

Page A

<div id="nav" data-turbolinks-permanent="true">
  ...
</div>
<ul id="resources">
  <li>one</li>
  <li>two</li>
</ul>
<div id="timer-banner" data-turbolinks-temporary="true">
  ...
</div>

Page B

  <ul id="resources">
    <li>three</li>
  </ul>
  <div id="something-else-conditionally-shown" data-turbolinks-temporary="true">
     ... 
  </div>

Scenario 1: Turbolinks.visit(url, change: 'resources')

<div id="nav" data-turbolinks-permanent="true">
  ...
</div>
<ul id="resources">
  <li>three</li>
</ul>
<div id="something-else-conditionally-shown" data-turbolinks-temporary="true">
 ...
</div>
<div id="timer-banner" data-turbolinks-temporary="true">
  ...
</div>

Scenario 2: Turbolinks.visit(url, keep: 'resources')

<div id="nav" data-turbolinks-permanent="true">
  ...
</div>
<ul id="resources">
  <li>three</li>
</ul>
<div id="something-else-conditionally-shown" data-turbolinks-temporary="true">
 ...
</div>

So now we want to get back from B to A and If I'm understanding "reverse-apply" correctly, we hope to reverse run the replacement? Does that meen doing a change where we had a keep and vice versa? If we keep just the set of nodes that changed, how do we know exactly where they end up in the DOM structure unless we use the current DOM as a starting point?

<div id="nav" data-turbolinks-permanent="true">
  ...
</div>
<ul id="resources">
  <li>three</li>
</ul>
<!-- I'd expect this to go away -->
<div id="something-else-conditionally-shown" data-turbolinks-temporary="true">
 ...
</div>
<!-- I'd want this to come back -->
<div id="timer-banner" data-turbolinks-temporary="true">
  ...
</div>

In this case it's all a flat node hierarchy, but you could imagine if this was nested we wouldn't really know where to put the nodes if we have no reference point (a handle with the same id in the current DOM).

I think the biggest challenge here is that since we don't enforce any kind of DOM structuring requirements in the replacement body we don't know enough about the DOM structure to make an easy guess on where a reverse-applied node belongs. We may be able to achieve this if we started tracking parents and siblings but that's scary.

Am I missing something big?

wkrsz commented 9 years ago

@kristianpd In scenario 2) on transition from A->B, shouldn't #resources items stay "one" and "two", due to keep parameter?

wkrsz commented 9 years ago

@kristianpd As I understand popstate, the user can return to any state in history, not just pop last one from the stack. This means the user might be switching from some page C which has completely different body to A, after navigating A->B->C. So it's not just matter of reverting one transformation.

wkrsz commented 9 years ago

Let's consider an example with partial replacement, if we implemented @Thibaut's proposal 4 (as I understand it):

Page A:

    <body>
    <nav>...</nav>
    <main id=“container”>I’m page A</main>
    <footer>…</footer>
    </body>

Page B:

    <body>
    <nav>...</nav>
    <main id=“container”>I’m page B</main>
    <footer>…</footer>
    </body>

Page C:

    <body>
    <p>Thank you</p>
    </body>

User performs following actions:

Turbolinks.visit(“/page-b”, change: “container”)
Turbolinks.visit(“/page-c”)
popstate to state 1.

On transition from state 1. to 2. we cached whole body HTML and safely stored DOM node for #container. Now, going back from page C to A Turbolinks has no choice but to re-construct body from serialized HTML. Then, optionally Turbolinks could replace newly constructed #container with one stored in cache.

When debugging problems with JS plugins I’d rather not have to reason about some elements on a page being re-constructed from serialized HTML and some re-attached.

Proposal

How about we cache DOM nodes, but only for simple full-page transitions, with no change, keep params or data-turbolinks-permanent elements? In nutshell back to Turbolinks 2.5 caching in simple cases.

In case of partial replacement we cache neither DOM nodes nor serialized HTML (@Thibaut's proposal 1.) On popstate URL will be fetched anew.

This way "plain" web applications get speed up with Turbolinks without breaking "back" button behavior. And burden of solving problems falls on developers who opt-in to using partial replacement.

And maybe if you're building something that more like an SPA, then you don't ever care about back button using cache and keeping input state that much?

Thibaut commented 9 years ago

@kristianpd:

in scenario 1, #something-else-conditionally-shown wouldn't be inserted since it doesn't match the change key and isn't already in the DOM. To reverse that operation, we would re-insert the replaced nodes (#resources and #timer-banner).
reversing a keep operation would be much harder. To get around that, I'd just kill the option, since I don't think it's that useful.
the real challenge I think will be when we need to re-insert elements that have been removed from the page. E.g., a data-turbolinks-permanent that wasn't on the next page, or if the JS running on the page changes the DOM so much that we can't easily reverse a change. The first issue could be solved by inserting hidden divs on the page, so we know where to put the nodes back. The second issue I don't see a solution other than saying "don't do that".

@WojtekKruszewski thanks for digging into this!

As I understand popstate, the user can return to any state in history, not just pop last one from the stack

Good catch. This isn't a blocker, but again adds more complexity (we'd need to assign an incrementing ID to each state, so that when you pop A from C we know we need to reverse both C and B).

Thibaut commented 9 years ago

In case of partial replacement we cache neither DOM nodes nor serialized HTML (@Thibaut's proposal 1.) On popstate URL will be fetched anew.

@dhh @reed what do you think?

This would be a way to keep the feature in place. The downside is that most people will probably make use of it, since it's so easy to do from the controller, and end up with slightly inconsistent back button behavior.

Thibaut commented 9 years ago

Unless there are any objections I'll implement what @WojtekKruszewski suggested above (keep partial replacement, but remove pages that have been partial-replaced from the cache so we can go back to caching document.body and un-break the back button for normal navigation) in the coming weeks.

amnesia7 commented 9 years ago

@Thibaut I've seen that this has now been closed with a pull request that doesn't cache the body if there's a partial replacement but just wanted to query something about this. I'm not sure if I've missed something somewhere but is there a reason why you would not cache the entire body html even if it was just a partial replace and reload the full body html from cache when going back and forward? Would this not mean that you were always caching the full body no matter how much of it got updated (page or partial) and going backwards and forwards through multiple pages would just grab the cache for that url and re-insert it all so as not to need to cherry-pick bits based on whether it had done partial or full pages changes between the page that the user was on and the one they are now trying to go back to? Wouldn't any js that should only be called once be run on page:fetch rather than page:load so that it would only insert the "extra" html once that would then get cached and so if the user clicked the back button it wouldn't re-insert that html again and any js functions would be bound to document which wouldn't change so wouldn't be lost between page changes if need be?

Thibaut commented 9 years ago

@amnesia7 You're describing how the cache worked before the fix for this issue. The problem is that as soon as we cache HTML (as a string), we lose all DOM state (not just event listeners but any data stored on the DOM elements themselves or which is only referenced through them).

Most JS code is not written with "separating DOM transformations from DOM state" in mind. Writing code in such a way is hard and most of the time bad practice. We can't expect people to do that.

amnesia7 commented 9 years ago

Ah, good point. Glad you're on the ball.

turbolinks / turbolinks-classic