iron-meteor / iron-router

A client and server side router designed specifically for Meteor.
MIT License
1.98k stars 413 forks source link

Action can be recalled before the data function, leading to template helper errors #576

Closed tmeasday closed 10 years ago

tmeasday commented 10 years ago

I think there have been hard to reproduce bugs that have run across this problem, but I am seeing it on a project and I think I understand.

The basic issue is that sometimes you'll get a template rendered with data context undefined/window, even though you use loading, and waitOn.

The reason is that you have written a data function that doesn't check this.ready() (which is legit if it just does something like Collection.findOne(this.params._id);).

Because data and action run in separate computations, and in this case data doesn't depend on this.ready(), action invalidates first when we become ready -- rerunning, bypassing the loading hook, and rendering the template with the old data value, which is undefined.

I'll make a reproduction to make it totally clear.

I think the solution is that we should always wait before running data. Like I said in #569.

tmeasday commented 10 years ago

Hmmm. Having trouble reproducing in a simple app (although even in my real app it doesn't happen every time). Fairly sure my analysis is correct -- adding a simple if (this.ready()) to my data function solved it neatly..

cmather commented 10 years ago

Two quick questions before bed:

1) is it a good idea to keep stale data from the previous data context around?

2) how would the dataNotFound hook work? Can the data function still return null?

On Apr 2, 2014, at 10:12 PM, Tom Coleman notifications@github.com wrote:

Hmmm. Having trouble reproducing in a simple app (although even in my real app it doesn't happen every time). Fairly sure my analysis is correct -- adding a simple if (this.ready()) to my data function solved it neatly..

— Reply to this email directly or view it on GitHub.

tmeasday commented 10 years ago

1) I wouldn't have thought so, no.

2) Yes of course. But we should wait until the subscriptions are ready before concluding that null data implies the item doesn't exist.

Basically I think waitOn and data are first-class entities and I can't see a legit reason to want them not to interact with each other.

cmather commented 10 years ago

What would we initially set the data to while we're waiting? In order to clear the previous data context?

On Apr 2, 2014, at 10:16 PM, Tom Coleman notifications@github.com wrote:

1) I wouldn't have thought so, no.

2) Yes of course. But we should wait until the subscriptions are ready before concluding that null data implies the item doesn't exist.

Basically I think waitOn and data are first-class entities and I can't see a legit reason to want them not to interact with each other.

— Reply to this email directly or view it on GitHub.

tmeasday commented 10 years ago

null or undefined I suppose.

tmeasday commented 10 years ago

I guess there's a question of what happens if you use dataNotFound but not loading. It's a bad idea, but I guess what'll happen is you'll see "not found" in place of a "loading" template.

isAlmogK commented 10 years ago

@tmeasday @cmather I can send you bitbucket access to my current project which has this issue if it helps.

I'm also doing

Collection.findOne(this.params._id);) 

and

data: function() {
        var hasMore = this.links().fetch().length === this.limit();
        var nextPath = this.route.path({_id: this.params._id, linksLimit: this.limit() + this.increment});

        return {
            title: Topics.findOne(this.params._id).title,
            shortDescription: Topics.findOne(this.params._id).shortDescription,
            _id: Topics.findOne(this.params._id)._id,
            links: this.links(),
            nextPath: hasMore ? nextPath : null
        };
    }
tmeasday commented 10 years ago

@almogdesign -- this is a slightly different issue -- in your case, you[1] are running code in the data function that expects the router to be ready. Although this is a reasonable assumption, it is explicitly not the case that it should be. It's easily solved by putting if (! this.ready()) return; at the top of the data function.

The issue that I have been unable to replicate is a bit more subtle than that -- the template rendering with data == undefined, even though the data function returns something.

On the first issue, I think the behaviour should be changed (and I think @cmather agrees), but even if not, I would like a reproduction and fix for the second issue as it's a much more difficult bug to deal with (even though the solution is the same, it's far from obvious that it should be).

[1] I realise that this being code from DM is actually my code ;)

tmeasday commented 10 years ago

@cmather - found an even more subtle version of this in DM's book app.

Here we use the loading hook but no the dataNotFound hook. What this means is that the action computation depends on this.ready() but not this.data().

What this means is that some times, when this.ready() changes, and things start invalidating, for whatever reason[1], the action computation invalidates before the data Box. So when I log everything to the console, I see

  1. setData getting called with the real data context of the template
  2. Template helpers getting called on the template
  3. Subsequently the data() function getting re-called on the layout.

Here's a screenshot incase you don't believe me: http://cl.ly/image/1h0E1G2a1C1h

I'm not really sure what the best approach for debugging exactly why things are invalidating in this order is. I'd love to hear your thoughts.

[1] And this is the tricky bit, I've never been able to reliably reproduce these issues, they seem very temperamental and sometimes happen when you refresh a page, other times not.

tmeasday commented 10 years ago

Ok, I retract the comment about not depending on this.data() in the action computation being the cause. I am still seeing the error maybe one time out of 20 even with it in there.

/me is going crazy here.

isAlmogK commented 10 years ago

@tmeasday - "sometimes happen when you refresh a page" I think this is the issue that I'm getting on my end when I do a page refresh.

tmeasday commented 10 years ago

@cmather - any thoughts? I've been trying to brainstorm ways to get to the bottom of these computation invalidation order issues in some principled way. I haven't come up with much yet.

As an aside I'm beginning to feel that issues like this are maybe a sign that separating the data and action computations is a mistake. Because in general we can't control which one re-runs first when dependencies change. I could be missing the real problem though.

tmeasday commented 10 years ago

It could also be a mistaking introducing another computation via UI.emboxValue (around data). This could also be contributing.

cmather commented 10 years ago

I tried to quickly read and catch up here but it looks like the issue may have evolved. I'm not sure yet whether this looks like a bug or is weird/unexpected behavior of core, or something else. My mental model for what is happening given the current implementation. It's possible what should happen is different. There could also be a bug.

Are you also still seeing helpers from previous routes running again?

cmather commented 10 years ago

Also, I just re-read your original post Tom. I should have added to my mental model above that computation dependencies should all the added in order, and flushed in order. I think it all happens in the same cycle (e.g. no afterFlush or setTimeouts) so it should run from top to bottom.

tmeasday commented 10 years ago

Hey @cmather - I realise the stream of consciousness makes the issue here hard to understand. I can't fully explain the problem here (especially the fact that the issues we are seeing are so intermittent), but my best guess is the underlying problem is the use of UI.emboxValue. Here's an illustrative example:

  var data = UI.emboxValue(function() {
    return Session.get('data');
  });

  Deps.autorun(function() {
    if (Session.get('ready'))
      Session.set('data', true);

    console.log('ready and data are:', Session.get('ready'), data());
  });

If you run this code and trigger ready to become true, you'll see:

ready and data are: false false 
ready and data are: true false 
ready and data are: true true

The important point being the middle moment where ready is true but data is false -- which is probably not what you'd expect! This corresponds to an (easily reproducible) case where the action for a controller runs with this.ready() as true and this.data() as null (even if the data function always returns something). I haven't actually been able to produce an example where the null data goes through to the template however (I guess there's an extra flush before the template actually renders).

But it does actually happen somehow (as evidenced by http://cl.ly/image/1h0E1G2a1C1h) -- see how setData is called with an object, but at a later point the template helpers run with null as data.

It makes sense that this can happen given the example above. I guess getting to the bottom of it will require figuring out the extra flushes above. Will report back if I figure out more.

chrisbutler commented 10 years ago

this is exactly what i'm dealing with, and i was able to fix it by checking data() the action function (https://github.com/EventedMind/iron-router/issues/569#issuecomment-40907116)... i'd imagine that this is a better place for continuing that discussion

tmeasday commented 10 years ago

@cmather - I figured it out. Reproduction here: https://github.com/tmeasday/iron-router-576

Here's the explanation. The issue is the combination between a login hook and the built in loading hook. If you have something like:

Router.onBeforeAction(loggedIn);
Router.onBeforeAction('loading');

Then what you end up with is a data computation that looks something like

if (this.ready()) {
  // do something
}

And a ready computation that looks something like

if (! Meteor.user()) {
  // do something
} else if (this.ready()) {
  // do something else
}

So initially, the data computation depends on a subscription's readiness, where as the action computation only depends on the user.


Now, when the page loads the following two things happen:

A. At some point there's an added msg for the user data coming over the wire. B. At some later point the ready msg comes over for the sub.

A will always happen before B, but the key point is that A + B can sometimes happen in the same event tick, and sometimes happen in different event ticks (depending on a lot of stuff I guess).

Consider the second case.

  1. In the first tick, Meteor.user() changes, triggering the action computation to invalidate.
  2. At the end of the tick, a flush happens, the comp re-runs, and sets up a dependency on this.ready() (notably after the data dependency).
  3. In the subsequent tick, this.ready() changes, triggering the data computation to invalidate, and then the action computation. So things run in the right order.

Now, consider the first case.

  1. In the same tick, both Meteor.user() and this.ready() change, triggering both computations to invalidate.
  2. At the end of the tick, a flush happens, both computations re-run. However the action computation re-runs first, because A happened before B.

If the action function runs first, then due the embox issue I outlined above, data doesn't work properly.


In the example I posted, you can see two workarounds that highlight the explanation.

  1. Swap the order of the hooks, and it works -- because now the action comp first depends on this.ready(). (And the data comp depends on it first).
  2. Add a timeout in the publication to ensure A and B happen different ticks.

Let me know if it makes sense, and if you have time to discuss..

cmather commented 10 years ago

Hey Tom, nice work. This explanation makes sense. Would definitely like to work through it once devshop is over. Maybe tomorrow.

On Apr 24, 2014, at 5:20 AM, Tom Coleman notifications@github.com wrote:

@cmather - I figured it out. Reproduction here: https://github.com/tmeasday/iron-router-576

Here's the explanation. The issue is the combination between a login hook and the built in loading hook. If you have something like:

Router.onBeforeAction(loggedIn); Router.onBeforeAction('loading'); Then what you end up with is a data computation that looks something like

if (this.ready()) { // do something } And a ready computation that looks something like

if (! Meteor.user()) { // do something } else if (this.ready()) { // do something else } So initially, the data computation depends on a subscription's readiness, where as the action computation only depends on the user.

Now, when the page loads the following two things happen:

A. At some point there's an added msg for the user data coming over the wire. B. At some later point the ready msg comes over for the sub.

A will always happen before B, but the key point is that A + B can sometimes happen in the same event tick, and sometimes happen in different event ticks (depending on a lot of stuff I guess).

Consider the second case.

In the first tick, Meteor.user() changes, triggering the action computation to invalidate.

At the end of the tick, a flush happens, the comp re-runs, and sets up a dependency on this.ready() (notably after the data dependency).

In the subsequent tick, this.ready() changes, triggering the data computation to invalidate, and then the action computation. So things run in the right order.

Now, consider the first case.

In the same tick, both Meteor.user() and this.ready() change, triggering both computations to invalidate.

At the end of the tick, a flush happens, both computations re-run. However the action computation re-runs first, because A happened before B.

If the action function runs first, then due the embox issue I outlined above, data doesn't work properly.

In the example I posted, you can see two workarounds that highlight the explanation.

Swap the order of the hooks, and it works -- because now the action comp first depends on this.ready(). (And the data comp depends on it first).

Add a timeout in the publication to ensure A and B happen different ticks.

Let me know if it makes sense, and if you have time to discuss..

— Reply to this email directly or view it on GitHub.

tmeasday commented 10 years ago

The above explained how DDP message timing caused the action computation to "complete" before the data computation, and thus why the issue was unreliable, and intermittent.

To help come up with a solution, and for those interested, here's an explanation of why the computations completing in a different order actually causes the problem we've seen.


There are 4 computations of particular interest:

  1. The data computation, that always starts off depending on this.ready().
  2. The action computation, which is very much user-defined, but in this case, first depends on an document, then this.ready().
  3. Blaze-layout's emboxed data computation, which depends on an internal dep managed by the data computation.
  4. The rendering computation setup by the layout, which depends on the emboxed data and the current main template.

The important case is when 2. "completes" -- by this I mean gets past all its guards, and sets the main template to whatever page we should be seeing.

Consider case I -> when 1 completes before 2. Then:

a. 1 completes, sets the data, and invalidates 3. b. 2 completes, sets the templates, and invalidates 4. c. 3 re-runs, re-invalidating 4. d. 4 runs, gets the new template + data, and all is well.

Consider case II -> when 2 completes before 1. Then:

a. 2 completes, sets the template and invalidates 4. b. 1 completes, sets the data, which invalidates 3. c. 4 re-runs, it gets the new template, and calls layout.getData() which still returns the old (null) data. Now the template renders with the wrong data, and things blow up. d. 3. re-runs, invalidating 4. e. 4 re-runs, gets the right template and the right data, and renders correctly.


Point IIc. above is the example of the "weirdness" of UI.emboxValue that I outlined above -- if you call it when it's invalidated, it returns it's old value, even though it knows it's wrong. This is probably something that should be fixed in core, unless I'm misunderstanding something.

Perhaps that's the simple fix to this problem. There is a deeper issue here that I'm concerned about though -- we are very much relying on computations re-running in the correct order. Can we guarantee that they'll run in a deterministic order? Because these issues are kind of horrible and I'd like to feel confident that it can't happen again.

For instance -- we realise that we can guarantee that the action computation won't complete before the data computation. The action computation will invalidate the render computation. Can we guarantee that the data computation will complete before the render computation re-runs in such cases? Because if not, changing embox isn't going to help.

Food for thought...

isAlmogK commented 10 years ago

@tmeasday great job and explanation

cmather commented 10 years ago

@tmeasday, @almogdesign, Agreed - great explanation and getting to the bottom of this. The gears are turning in my head about making this deterministic. It might be some combination of removing (or creating) our own version of emboxed value, reverting to one computation, changing the nature of this.ready (optimizing it so it doesn't depend on number of subscriptions), etc. This is on my short list.

tmeasday commented 10 years ago

@cmather - can you check the above commit. I think it expresses the problem case pretty succinctly, but I can't get it to run due to an issue with exporting symbols from blaze layout.

We could also just do a simpler test case on BL directly that calls setTemplate, then setData, then Deps.flush() -- in fact I'll work on that today if I get a chance -- but I think we should get this one running and failing on old dev first to ensure we are fixing the full problem.

cmather commented 10 years ago

For people with an unusual interest in computations and flush cycles :-), conversation ensued here: https://github.com/tmeasday/iron-router-576/commit/7b2b1e8605c02bba09cde20b07ed2f6226566166#commitcomment-6178942.

tmeasday commented 10 years ago

@cmather - I've just run into an issue that I think is related. I'm not sure if it's worth creating a repro as it's probably pretty timing dependent, so here's the basic situation:

  1. I have a waitOn that looks like (only subscribe to the base sub if you are logged in).

    waitOn: function() {
     return Meteor.userId() ? Meteor.subscribe('base') : [];
    }
  2. Elsewhere in the code, I have a console.log like so:

    console.log(Router.current().route.name, Meteor.userId(), Router.current().ready(), Schools.find().count())
  3. When you login, you'll see the following in the console:

    dashboardShow LJtWFG8pZ9NfBX8SH true 0 
    dashboardShow LJtWFG8pZ9NfBX8SH false 0 
    dashboardShow LJtWFG8pZ9NfBX8SH true 1 

ie. There's a brief period where Meteor.userId() is tripped, the sub isn't ready, but Router.current().ready() is true.


In short, controller.ready() should always be correct, not wait for a flush cycle to be so. Not sure if this is a problem with the waitList implementation?

cmather commented 10 years ago

Hey @tmeasday, the waitlist now uses our new Deps.cache method so the value will always be up to date, not requiring a flush cycle. However, the ready state is initialized to true. This is so that if your waitlist is empty we consider it to be ready. So if your console log runs before you've added anything to the waitlist, the ready state will be true. Is that's what's happening?

tmeasday commented 10 years ago

Yes, that's what's happening -- strictly speaking it's not a problem with the waitList. But you can see that it's counter intuitive that Meteor.user() is set but that the wait list is ready without yet subscribing.

It's not a problem with the waitList per se but more so with the way we evaluate the waitOn functions in the "wait computation" then push them to the waitList. Thus, again, as with the data, there is a period where the state of this.ready() is incorrect as we wait for the wait computation to run again.

I think the correct solution is that the controller not have a waitOn computation either (much like I propose for data), and instead to pass the list of waitOn functions directly into the waitList.

I think there's a philosophical issue here about whether reactive functions need to be always correct, or just eventually correct (modolo a flush or two). My initial reaction is strongly toward the first, as the bugs that are caused are just plain horrible.

cmather commented 10 years ago

@tmeasday, I need to see some code to better understand. When you get a chance could you propose a change, maybe in a PR or a branch and I'll take a look? I'm not connecting how taking the wait on call out of a computation will fix the above problem. But my brain might not be operating correctly :).

tmeasday commented 10 years ago

Ok, I'll make something simple to explain the concept.

tmeasday commented 10 years ago

OK @cmather - check this: https://github.com/tmeasday/reactivity-always-correct

I think it explains itself more or less.

th0r commented 10 years ago

I confirm this annoying bug: in my template's created hook this.data sometimes equals null, but it just can't be null. It happens every time I go away from the page and come back, using the link, not the browser's Back button. Also, first page load always goes fine. And sadly, I can't find any workaround... @cmather @tmeasday, can you advice me something please?

cmather commented 10 years ago

@tmeasday, Thank you that was very helpful. I see what you mean now, but I'll confirm my understanding below and also make the change in IR this morning.

Thanks for hunting this down Tom.

tmeasday commented 10 years ago

Yep. I think the lesson here is that building up a bunch of computations and passing data between them isn't a great pattern, as it leads to hard to analyse "correctness" of various things. So I'd advocate removing computations from IR where possible and instead letting the computation get setup by the elements that care about reactive data. So:

  1. Remove waitOn computation, instead pass generated waitOn function into the waitlist. The waitlist can optionally wrap it in Deps.cache if it wants, but the idea is that the waitOn function gets recalled every time we need to check if we are ready().
  2. Remove data computation, likewise pass the generated data function to the layout directly. Again, the layout can wrap it in Deps.cache if it so desires.
  3. [Potentially] - remove action computation, and pass a generated "template generating function" to the layout directly. We could let the render computation that's setup by yield.render call this function, and thus setup reactive dependencies.

I actually think this is all more "correct" and will lead to less surprise. 3. would take a major refactor of IR though so maybe can go cool it's heels in the icebox :)

cmather commented 10 years ago

I'm inclined to agree in general. It turns out the implementation for above exposed some other issues and is a little tricky. Doing a little rework on Deps.CachedValue, Deps.CachedDependency and the waitlist.

isAlmogK commented 10 years ago

Been following the discussions / issues, I take it we're looking at breaking changes with this. Is that correct?

cmather commented 10 years ago

Breaking changes with what?

On May 9, 2014, at 7:23 AM, Almog Koren notifications@github.com wrote:

Been following the discussions / issues, I take it we're looking at breaking changes with this. Is that correct?

— Reply to this email directly or view it on GitHub.

isAlmogK commented 10 years ago

How the waiton well be handled now?

cmather commented 10 years ago

I don't see the API changing in the short term (eg having a waitOn function and returning subscriptions). But the autorun implementations will probably change. Will it break stuff? Maybe depending on how you're using it. But I'd expect it to be more correct. For most people it probably won't change anything. Also, I think we should make whatever changes required to make the router robust, correct, etc before v1. After that we should take greater care with API deprecations.

On May 9, 2014, at 8:33 AM, Almog Koren notifications@github.com wrote:

How the waiton well be handled now?

— Reply to this email directly or view it on GitHub.

cmather commented 10 years ago

@tmeasday, this should all be fixed on devel. Let me know if you still see issues. I backported the refactor data behavior to devel so that the data function isn't called until necessary. It's not called in the controller run method anymore.

I did not backport the new waitlist. I think I'll just hold off on that until the refactor release.

proehlen commented 9 years ago

I'm still seeing this but only regularly in Internet Explorer for some reason. On OS X (Chrome, Safari and Firefox) plus mobile I cant see this happening.