Promise never settles when one of multiple queries fails

simonihmig commented 5 years ago

I came across an issue when sending multiple queries that end on the same task queue, when one of those failed all the Promises returned from those different source.query() calls never settle (i.e. neither resolve nor reject).

I already debugged this for many hours, and tried to find a minimal reproduction, without success unfortunately. This demo tries to mimic what we are doing in our app: https://codesandbox.io/s/orbit-pending-promise-056c2. But unfortunately as I said it does not show the same effect we experience in the real app. Not sure what piece of the puzzle is missing here...

So in that demo that final catch block (in loadData()) does get executed (caused by the 400 response from the missing book request), while in our app the promise never settles.

One important piece of the puzzle must be the request strategies' catch hook, which is essentially identical to https://github.com/cerebris/peeps-ember-orbit/blob/master/app/data-strategies/store-remote-query-pessimistic.js.

From debugging through the various (async) orbit calls, I have a strong feeling that it must be some kind of interference between the tasks invoked through the actual query calls, and those coming from requestQueue.skip() (as they also call the queue's process()).

More specifically this is what I see in our app:

The request strategy's catch hook is called before this one: https://github.com/orbitjs/orbit/blob/master/packages/%40orbit/core/src/task-queue.ts#L361-L365.
the .skip() causes _resolution to be set to null here: https://github.com/orbitjs/orbit/blob/master/packages/%40orbit/core/src/task-queue.ts#L339
the following process() call sets a new Promise instance here: https://github.com/orbitjs/orbit/blob/master/packages/%40orbit/core/src/task-queue.ts#L292
after that has happened, the aforementioned catch in _settleEach gets invoked, but as _resolution is a new reference, the equal check does not match: https://github.com/orbitjs/orbit/blob/master/packages/%40orbit/core/src/task-queue.ts#L362
so _fail() is not called, and as such the previous Promise is not rejected: https://github.com/orbitjs/orbit/blob/master/packages/%40orbit/core/src/task-queue.ts#L328

In the demo, this is the other way around! So that last equals check is true, the promise is rejected, and after that the request strategy's catch hook is invoked, calling skip().

I have no idea what is causing this different timing between app and demo. 🤔

Some ways I found to work around this, though all options seem less than optimal:

Don't call .skip() at all
Replace .skip() with .clear()
Call .skip() in the next event loop (to delay it so that the timing as described above is "fixed"):

      catch(this: RequestStrategy, e: Error) {
        setTimeout(() => {
          this.source.requestQueue.skip();
          this.target.requestQueue.skip();
        }, 0);

        console.error(e);
        throw e;
      }

dgeb commented 5 years ago

@simonihmig thanks so much for the detailed error report. This helps me clearly see the hole through which promises are falling in _cancel.

I am thinking that the right answer is that any explicitly canceled tasks should ALWAYS have their promises rejected, either through a standard error or a custom error passed into skip, clear, or shift (retry should not reject, since the task may yet succeed).

dgeb commented 5 years ago

Furthermore, if you want the original request to fail with the error that you caught in your RequestStrategy, you could still catch it and then pass that error object to skip (this.source.requestQueue.skip(e)). Otherwise, skip() with no argument would reject the promise with a generic TaskSkipped error, for example.

I think this is the right answer but still need to put it to the test and make sure I'm not overlooking anything.

simonihmig commented 5 years ago

Thanks @dgeb for looking into this!

A general question related to this: I saw this pattern to catch and explicitly call requestQueue.skip() in a RequestStrategy in your peeps-ember-orbit demo and somewhere else IIRC, but AFAIK it's not documented/recommended it the actual Orbit docs. So I wonder is this indeed best practice, or what would be the downside of not doing this?

dgeb commented 5 years ago

@simonihmig I think that dealing with conflicts and errors in general is perhaps the trickiest part of working with Orbit, because there is no one-size-fits-all solution. Once an error occurs, you need to decide what to do that's appropriate for your app. The queues are paused and all the tools are available to you to skip or retry the problematic task, or just clear the slate. This is much like dealing with a merge conflict in git - there's only so far the tool can go before you as the developer need to make a decision.

So I wonder is this indeed best practice, or what would be the downside of not doing this?

I think peeps-ember-orbit is a bit overly simplistic here and that best practices are still being worked out in the community. The main point to focus on is that something needs to be done to restart the queue processing and you need to decide what's best given the error you receive and the state of your app. I definitely would like to write more about this and include it in the guides.

But of course I really need to make sure that Orbit is working as reliably as possible in this tricky area, so I'll investigate further and try to make the improvements I mentioned above soon.

simonihmig commented 5 years ago

Thanks for your explanation, that makes sense!

Furthermore, if you want the original request to fail with the error that you caught in your RequestStrategy, you could still catch it and then pass that error object to skip (this.source.requestQueue.skip(e)).

TS complains about expecting no arguments, and indeed I don't see where any args are used here: https://github.com/orbitjs/orbit/blob/master/packages/%40orbit/core/src/task-queue.ts#L215-L224?

dgeb commented 5 years ago

@simonihmig sorry for the confusion - I am proposing the addition of a custom argument to skip to make this approach possible, but it is not yet. See:

I am thinking that the right answer is that any explicitly canceled tasks should ALWAYS have their promises rejected, either through a standard error or a custom error passed into skip, clear, or shift (retry should not reject, since the task may yet succeed).

I'm really busy today and tomorrow but will try to implement it soon.

simonihmig commented 5 years ago

Ah ok, sorry, should have read that better! 😝

I'm really busy today and tomorrow but will try to implement it soon.

You are awesome! 🙏

dgeb commented 5 years ago

@simonihmig This should finally been resolved in v0.16.0-beta.9. Apologies for taking a while here - it's a sensitive area of the codebase and required thorough testing. 😅

Please give the new beta a try and let me know how it goes.

lolmaus commented 5 years ago

Hey @dgeb, thank you for looking into this! 🙇

Unfortunately, we no longer have access to the repository where we had this issue.

@simonihmig may still have a local branch with a failing test. But he'll only be able to respond in two weeks or so.

orbitjs / orbit

Promise never settles when one of multiple queries fails #653