Differentiating Programmer Errors from Operational Errors

chrisdickinson commented 8 years ago

This primarily concerns the error symposium participants, but may also concern the post mortem WG.

The Problem

Promise users expect all Promise-returning API usage to return a Promise object, success or fail.
This means that invalid API use is returned as a rejection, alongside other errors.
This necessitates the use of try / catch in async/await code.
This also necessitates error type checking in promise-based code.
- Were a "abort on unhandled rejection" flag to land, this would reduce such a flags efficacy: given that such a flag would operate by attempting to immediately abort the promise at top of stack assuming no user-installed handlers were present, and given that in type-checking users would usually add user-installed handlers.
  Proposed Solutions
  
  Recovery Object

Add an optional object parameter to all promise-returning APIs. This will be known as the recovery object. It does not change the behavior of Promises/A+; it is purely a Node API-level pattern. It allows users to intercept errors before settling the returned promise, and change the outcome of the lower-level API call:

// recovery object
await fs.readFilePromise('some/file', {
  ENOENT() {
    return null
  }
})

// normal use:
fs.readFilePromise('some/file')

Pros:

Promise API users get the API they expect. Error symposium users get the API they expect.
Simple to implement.

Cons:

It's a new pattern.
It doesn't solve problems in the ecosystem.
--abort-on-sync-rejection

Add a flag to abort the process on synchronous new Promise rejection. This does not extend to synchronous throw within handlers. Potentially patch Promise.reject so that it returns a pending promise that rejects on next tick.

For example, the following usage would abort under this flag:

new Promise(() => { throw new Error() })
new Promise((_, reject) => { reject(new Error()) })

The following usage would not abort under this flag:

new Promise((_, reject) => { setTimeout(() => reject(new Error())) })
attainPromise().then(() => { throw new Error() })

The net effect being that the promise would immediately exit on invalid Node API use under this flag. Without the flag, the process keeps running "as expected" for the majority of promise users.

Pros:

Solves problem in ecosystem for native promise users.
Does not introduce a new pattern.

Cons:

Unclear on how common synchronous rejection is in Promise constructors. Believed to be uncommon, could be proven wrong?
Flagged and unflagged behavior diverge.
This Discussion

Please pose problems with or benefits of the listed solutions, or propose alternate avenues of inquiry which will be added to the issue text. Clarifying comments, like "how are you using this?" are in-scope. Comments that suggest that the desire to separate operational errors from programmer errors is invalid will be moderated in the interest of keeping this discussion productive. Thank you for participating!

benjamingr commented 8 years ago

@groundwater we've added unhandledRejection events to Node over a year ago https://github.com/nodejs/node/pull/758 based on https://gist.github.com/benjamingr/0237932cee84712951a2 . This used to be a problem.

chrisdickinson commented 8 years ago

@groundwater Sorry, bad phrasing on my part — I meant that to read as "Your use case proves that some users do find differentiating these errors necessary."

DonutEspresso commented 8 years ago

@petkaantonov

The problem is not that the producer made the wrong decision, the problem is that the producer made a decision at all. I can easily produce example where safe-json-parse is wrong

I think there's been an misunderstanding. I absolutely want to decide what that error means. The point of two channels is not that the producer needs to decide what channel to communicate the error through (and indirectly whether or not an error is fatal, as you are suggesting), but that the channel used is always consistent.

Any method that you can call into should be able to determine if it can proceed with what it's trying to do. Either through localized try/catch or result type checking (as in the applyTransforms example). If it determines it cannot proceed, then an error object must be returned. It is then up to the consumer to decide whether this is fatal.

The throw channel is never used to communicate these errors, precisely so that typos and null refs automatically trigger an uncaughtException at either the domain or the process. For better or worse, the end result is a strategy that pretends, for the most part, that the language does not support exceptions.

petkaantonov commented 8 years ago

The throw channel is never used to communicate these errors

The throw channel is already liberally used in the ecosystem and node core for errors that consumers can decide are actually expected errors. My silly port example is based on the real node API where a .listen will throw for invalid port, for example.

DonutEspresso commented 8 years ago

Yes, and the strategy thus far is to defensively wrap some of these APIs based on whether or not they should be fatal to your application (localized try/catch). This is neither ideal, nor ergonomic, but it's the trade off that's made to ensure we can safely differentiate errors.

EDIT: I should add that this is nuanced view, it is not unreasonable to assume that node core itself could be in a bad state when it throws. No way to make this determination programmatically, but that's a thread for another day!

petkaantonov commented 8 years ago

As you say, and what is my whole point, you cannot reliably determine the error's category by checking which channel it comes from. From that inevitably follows that you need to check all the channels for all categories. It is more convenient to simply just have one channel if you're going to have to check all the channels anyway.

DonutEspresso commented 8 years ago

It's a trade off conversation. I can more easily and safely localize try/catch when operating with APIs that are known to throw, at the expense of verbosity and ergonomics, vs attempting to make that determination at run time.

petkaantonov commented 8 years ago

I can deal with "it's a trade off". That is tremendous progress. Thanks.

markandrus commented 8 years ago

@chrisdickinson as an author of some Promise-based APIs, the proposed --abort-on-sync-rejection flag concerns me. I've written code that synchronously rejects, and I expect many others have, too. There are valid reasons to do so.

Given this, I do not think the time at which an error is raised (synchronously or next tick) is a good mechanism for distinguishing programmer errors from operational errors. This convention is a mismatch for Promises, and I am not sure how widely it is followed in callback-based code outside of Core.

I tend to agree with @balupton's suggestion of introducing a type for programmer errors. We could introduce this type to both callback- and Promise-based APIs in Core, without imposing any restrictions on synchronous rejections in Promise code. We could also build debugging features around such a type.

benjamingr commented 8 years ago

@markandrus the approach suggested in https://github.com/groundwater/nodejs-symposiums/pull/6 by @groundwater and @chrisdickinson is to check for a .code property for operational errors but it does not make the distinction based on the type alone.

This sounds like a reasonable path to explore - would you mind writing a concrete proposal based on it in a pull request?

@chrisdickinson can you document this in the proposed solutions in the post title?

DonutEspresso commented 8 years ago

@petkaantonov Sure, software is all about trade offs. Ultimately it's up to consumers to determine what's right for their use case. I just hope that I've clearly articulated the trade offs that we make at Netflix, and to ensure that the concerns that drove us to make these trade offs are captured as part of the discussion. We're certainly not the only ones.

@chrisdickinson I've been thinking about the flags that have been proposed so far, including --abort-on-sync-rejection. Since they trigger different behavior depending on the runtime, I think this could potentially break some assumptions about the write once run everywhere approach. I haven't yet fully thought through an end to end scenario, but it certainly seems like it could cause some "surprise!" moments. Is that ok, given the opt-in nature of the flags? Is using a flag synonymous with "you're off the reservation, you're on your own"? That seems to be the spirit of what we're proposing.

markandrus commented 8 years ago

@benjamingr thanks, I will review the PR you linked. My only concern with using .code as described is any overlap with userland error subclasses (I can think of one library I maintain that uses this property). Otherwise, if I can find time to help I will look into creating the proposal you mentioned. Do you mean opening it against groundwater/nodejs-symposiums?

benjamingr commented 8 years ago

@markandrus that actually sounds like a good thing - or do they use .code for programmer errors too?

You can open it on this repo - I'd rather we keep all the stuff under the nodejs org since it enforces organization wide things (like the code of conduct) but I don't have strong feelings about it.

rvagg commented 8 years ago

fwiw I've removed my original comment from here, it's clear I've not done a good job at expressing what I was trying to say and have only caused more argument in the process

benjamingr commented 8 years ago

@rvagg argument in the process is fine and to be expected - it's definitely much better than not having the argument and not addressing the problems.

You have made it abundantly clear that you have a problem with promises - that's fantastic. I mean it. We could use someone like you here. If you will not voice your concerns and we will not debate them we won't be able to make progress. Not having someone with opinions like yours participate would significantly under-represent callback users meaning that:

a) We won't be able to make nearly as much progress since things would more likely get stuck when trying to get CTC approval. b) Features V8 develops that use promises either internally or directly might break your coding flow, tools you use and programs. We need to know what those things are before devising a strategy for solving these issues before they make it to a release.

Especially since you're a seasoned node developer.

So I ask you to reply to my comments from https://github.com/nodejs/promises/issues/10#issuecomment-184608899 . I can't make you participate but I sure would appreciate it.

rvagg commented 8 years ago

@benjamingr:

You have made it abundantly clear that you have a problem with promises

I'm trying very hard to not make this about my personal opinion on promises which is where I've obviously failed in this thread. My opinions are not a secret, they are just not particularly relevant. To get it out of the way: I choose not to use Promises for the majority of the JavaScript I write and have strong preferences about language and pattern simplicity in general. I don't like the Promises approach to handling errors (generally, not just about operational vs programmer errors), I find their state, error and continuation management far too complex to reason about in non-trivial applications, I find that they make a codebase harder to read and understand and not easier. I'm sure you've heard this all before and find it tiresome. It's all about personal preference and I have no problem with the growing popularity of Promises as a pattern that people find helpful for whatever reason and make no judgements of people that choose to adopt them as their primary asynchronous programming tool. I've been trying to dive deeper with Promises (grokking and using) in an attempt to understand the perspective of those who embrace them so completely. I continue to struggle to see the light but arguing about preferences on this matter is just as absurd as arguing about someone's preferred programming language. So let's put that aside because there's nothing much to be gained here.

Jumping in to this thread was simply an attempt to broaden the discussion and suggest that there is a larger group of stakeholders, beyond the narrow error-symposium and postmortem groups, that have a deep interest in this particular topic. And that attempts to build bridges in order to gain acceptance of having a Promises API in core would likely be assisted by recognising this and pandering to this perspective in some way if possible. Being purist about Promises and asking that everyone accept the same view on how errors should be managed will just get us bogged down further (or alternatively lead to rage quitting as has already been happening). You can't dismiss these concerns with "if you don't like the approach to error handling then don't use Promises in your code" because that's not how Node.js development works. We don't just have to deal with errors in our own code, we build applications on top of the largest package ecosystem in the world. Take it or leave it, it's just a suggestion.

Regarding my own opinion on this as a CTC member, I'm mainly interested on the question of whether or not a Promises API should be exposed by core at all, roughly what's being discussed in #21. I'm almost entirely focused on how we keep core lean and enable a healthy ecosystem of diverse needs and opinions to exist on top of it. I know I'm not alone in my obsession with "small-core" too, both inside our GitHub bubble and beyond. So it's the reasoning for why we would even go down this path that's relevant to me. Does this enhance or damage what we are trying to achieve by keeping core small? This is still an open question. Debugability, error differentiation, postmortem, AsyncWrap, etc. are all secondary issues for me and my interest in them is mostly about seeing the various stakeholders be involved and represented. I don't have nearly enough bandwidth to engage in all of the discussions that happen and am unlikely to be particularly helpful either, as I've already demonstrated.

I don't want to divert discussion away from the topic at hand here so I'm not going to engage in further discussion of my stated position. I'm going to try and engage properly on that elsewhere. Although the sheer volume of discussion at the moment is making it difficult to get anything productive done so maybe it'll have to settle down a bit before I can dive in (I've heard similar sentiments from others who are feeling stretched, so patience would be appreciated across the board).

benjamingr commented 8 years ago

@rvagg you still haven't addressed any of the comments from my previous post at https://github.com/nodejs/promises/issues/10#issuecomment-184608899- just pointing that out - that's fine.

I don't like the Promises approach to handling errors (generally, not just about operational vs programmer errors), I find their state, error and continuation management far too complex to reason about in non-trivial applications, I find that they make a codebase harder to read and understand and not easier. I'm sure you've heard this all before and find it tiresome.

This is the first time I've had a callback user actually talk back to me like that. I mean that in a good way. People typically just say they don't like promises because they don't understand them or won't discuss it further - I would love to discuss it with you since you're a seasoned node developer.

I have not heard it all before and I do not find it tiresome. If you're interested I would definitely like to make a discussion out of this. Discuss what you hate about their state management, error handlers and what you find hard to reason about. That discussion could further be referenced when such claims come again and it would possibly make people who share your opinion to talk about it.

I continue to struggle to see the light but arguing about preferences on this matter is just as absurd as arguing about someone's preferred programming language.

Except this is the place where platform choices are made. If no one will argue about it we will never really gain the perspective of the other side.

Jumping in to this thread was simply an attempt to broaden the discussion and suggest that there is a larger group of stakeholders,

We appreciate it. That's why I keep asking for more.

So it's the reasoning for why we would even go down this path that's relevant to me. Does this enhance or damage what we are trying to achieve by keeping core small?

Agreed. I'm inconclusive about this myself. There are major gains and losses and I'd appreciate your participation in that discussion too.

erights commented 8 years ago

Sorry about that. I misunderstood how to use the Github labels ui. I did not intend to change the labels.

DonutEspresso commented 8 years ago

People have different needs when it comes to handling errors, and by association, their approach to failing fast. What's acceptable to some is not to others, but in theory that choice can be made independent of the async abstraction you choose. I think that's where the problem lies.

If we agree that there is a difference between programmer and operational errors, and that really the only difference is an approach in how you handle them (as outlined in our discussions above), then the two things I absolutely want are:

a clear way to distinguish between these two
the ability to have programmer errors (of the null ref and typo type) bring down the process
the ability bring down process at top of stack for post mortem (some may see this as a bonus, but it is absolutely critical for many deployments)

However, due to a more aggressive approach to error handling when using promises, it can be hazardous to make a run time determination of programmer vs operational, which leads directly to an inability to bring down the process via a fail fast strategy.

As far as I can tell, there hasn't been a great story, or even consensus within the promise community for distinguishing between the two. The most common feedback I've heard is "there isn't a difference, so don't worry about it." Unfortunately, I don't think that really helps address the concerns from those on the other side of the fence whom already do this today using the strategies discussed in this thread.

If we all agree on at least the premise, which, unless I'm reading the thread wrong, it appears that we do, then I think we've made a big step forward. Next steps would be to figure out how to tackle the three concerns above, which some of the other threads are already investigating.

spion commented 8 years ago

@DonutEspresso I would say that it is node that has a problem distinguishing programmer from operational errors and hand-waves it by saying "well, if its thrown its a programmer error". Here by node I mean the behaviour of the core libraries, but also the resulting shaky community consensus

There are many good reasons that try-catch should be avoided in node:

It provides an easy post-mortem story which would be much harder (perhaps impossible?) otherwise because of the lack of filtered catch.
It makes it easier to deal with exceptions, because otherwise you really have to think hard and write proper try-finally blocks.
Finally, callbacks lack any exception propagating mechanism which makes thrown errors truly impossible to deal with. Infact they propagate errors the wrong way! These factors together led to the current best practice.

But "programmer errors" isn't a very good one, and the error handling document IMO doesn't do a good job to justify that claim. The resulting debate about what constitutes a valid reason to throw an error I think stems from the conflation of programmer errors and thrown errors

I wrote a simple comparison of the callback VS promise story, taking the second example from https://www.joyent.com/developers/node/design/errors as a gist here: https://gist.github.com/spion/60e852c4f0fff929b894 . It outlines some of the problems with the claims in the error handling document, and how promises approach things in a very different way.

winterland1989 commented 8 years ago

This is the first time I've had a callback user actually talk back to me like that. I mean that in a good way. People typically just say they don't like promises because they don't understand them or won't discuss it further - I would love to discuss it with you since you're a seasoned node developer.

Just like in https://github.com/nodejs/promises/issues/21#issuecomment-185521089 i proposed, Promise is not the only way to describe the dependence graph of asynchronous operations , Promise get widely accepted because its monodic interface, but that's not hiding its internal complexities.

People don't understand Promise internal often misuse them in variety ways. For example, some implementation do not encourage use new to create instance because it's expensive, the second callback parameter to then is not recommended because it won't save you in case the first callback fail, all this gotchas are abstraction leaking IMO. Promise is too complex to became a basic async primitive, it solve callback problem in a opinionated way.

groundwater commented 8 years ago

I think @DonutEspresso has echoed my concerns and priorities quite well. Promises push expected and unexpected errors through the same channel, which for many people is undesired behavior. A sufficient example of "unexpected error" is that a function being called has a ReferenceError due to a mistyped variable name. I doubt anyone defensively codes around that.

Unless the Promise API is going to change, the solution for those wishing to avoid catching unexpected errors is to never catch, and abort on unhandled rejections. I think @chrisdickinson proposed a great solution using the recovery object approach. It let's people decide whether to use try/catch control flow, or to use a multi/special return values to indicate error.

This is the only solution so far that accommodates all sides. I and likely many of us are not interested in having a battle over what is the right way to use (or not use) Promises. It is easy to accommodate both sides. I would prefer to talk about problems you might encounter with a "hybrid" solution, like what are the pathological cases of mixing libraries that use both techniques.

kriskowal commented 8 years ago

@groundwater I suspect that the real solution to the problem of distinguishing programmer errors has very little to do with promises and rejections, as it can be solved instead by aborting, not on unhandled rejection, but on the reference error or type error itself. For these cases, even sync try/catch can interfere with post-mortem analysis.

petkaantonov commented 8 years ago

@groundwater this thread is full of examples where node is using the throw channel for operational errors and errback for programmer errors. The problem with this approach is that the channel is chosen by the error producer and not the consumer who is the only one who can actually know. There are exceptions to this like VM thrown referenceerrors, but these (#NotAllErrors) don't change the problems with producer decision.

Errors are objects that can carry arbitrary context and data, they allow consumer to decide after the error is thrown. This is currently not feasible for post mortem users (who I understand are keeping promises blacklisted and would keep them blacklisted until possible) but it will be when the try catch is augmented with ability to define predicate functions/expressions as guards for the catch clause. Predicate can be defined not to accept e.g. ReferenceError and the behavior would be in that case same as if you didn't have a try catch statement in the first place.

You are not accommodating promises by changing their API, that's unacceptable.

benjamingr commented 8 years ago

@groundwater namely - the following are all potential operational errors currently thrown by node:

Out of memory.
Parsing invalid JSON.
Attempting to bind to a closed UDP socket.
Listening to an "invalid" port.
If no error event handler is synchronously attached:
- HttpAgent addRequest if createSocket fails.
- HttpAgent removeSocket if createSocket fails.
- new ClientRequest if createConnection fails.
- Pushing to a Readable stream or unshifting from it if the chunk is invalid
- A writing after a stream is closed in _stream_writable
- Reading an invalid/non-string buffer chunk (validChunk)
- A _tlsError in tls.
- dgram in lookup
- FSWatcher in a bunch of places
- I'm going to stop now, but there's a very long list at https://github.com/nodejs/node/pull/5251/files
QueryString.decode in an edge case (it try/catchs twice first - that file is weird)
There are tens more of these - just look for throw in /lib.

Moreover, there are a lot of operational errors that manifest as synchronous errors:

Example 1:

I read a JSON object (or a csv) - everything is fine and it doesn't throw
I pass the result to ReadStream, it requires a string or an object but my JSON object contained a boolean.

Example 2:

I read a value from the database and expect it to be a postive number - I got null back
I pass the result to ReadStream as options.start and it throws synchronously

Example 3:

I read all the DNS values to look up from a CSV the user gave me - all are strings except one where the external JSON REST API I'm relying on that I wrote for getting the host names returned a number by mistake
I perform a dns.lookup and it throws synchronously that hostname must be string or falsey.

Example 4:

I get a url a user entered in a form on my website and sent me through a POST request.
The user entered is empty so the client doesn't send it in the JSON code - I get an empty field and it gets deserialized to undefined.
I parse it with Url.prototype.parse and it throws a TypeError synchronously.

I came up with all these examples on the spot here - if you'd like I'll gladly come with 10 more. All of these are what I believe (do correct me if I'm wrong) are operational errors (the last is debateable since I think that servers should sanitize for that). All are recoverable.

What @petkaantonov is trying to say is the distinction is not clear cut at the producer side (code raising the error). The consumer (the user consuming the error) always knows what they want or don't want to recover from. The reason exceptions are a part of languages in the first place is exactly in order to move the decision process to the consumers of the errors who understand their responsibilities better than the producers.

benjamingr commented 8 years ago

I have moderated the off-topic posts away. Please feel free to open a separate issue if you have concerns about how promises would look like in core - this thread is for debating operational errors vs. programmer errors and how/if we can make this differentiation as well as address concerns by post-mortem wg people related to them.

@winterland1989 I'd appreciate if you stop linking to your library, we've all seen it already - if you want to suggest its inclusion in the language core please do so in ESDiscuss, if you'd like to suggest its inclusion in Node core please open an issue at https://github.com/nodejs/node. In all honesty the language TC has already mostly reviewed and rejected the idea - and Node would not even consider it until it has significant userland adoption and a very good reasoning on why it cannot be done as well in userland.

You're very welcome to participate in this discussion on-topic. Namely - me @groundwater @kriskowal and @petkaantonov are having an interesting discussion on whether or not synchronicity can be reliably used as an indication of programmer errors in Node and we'd love more perspectives.

taylorhakes commented 8 years ago

@DonutEspresso expressed exactly my concerns. The distinction between programmer errors and operational errors is incredibly important. My program should not continue if I hit a programmer error. In my code I take the approach of never throwing or rejecting a Promise. That makes operational errors much easier to handle and I leave programmer errors to .catch. Yes there are APIs that unfortunately throw as @benjamingr listed. I tend to try/catch a single line in those circumstances.

Is anybody suggesting that more places should throw errors? That would make all errors go through the same mechanism. But you would be forced to write synchronous code with a bunch of try/catch statements

try {
  doSomething();
} catch (e) {
  handleError1(e);
} 
try {
  doAnotherThing();
} catch (e) {
  handleError2(e);
}

Promises have pushed us farther in that direction. We have one mechanism to handle async errors, but I would say that it's a bad mechanism. try/catch/.catch currently can't support disguising between different Error types. It just catches all errors and forces the developer to be smart enough to distinguish, which gets harder the farther away from the error. I have seen many developers make the mistake of not checking the error type in .catch. You can say this is an education thing, but it happens a lot even with experienced developers.

Hopefully we will eventually get type guards on .catch or catch (e) {, but how does that fix the current problems? Those types of language changes could be years away. I am not comfortable suggesting more use of try/catch when I feel it will only lead to more confusion and bugs for developers.

nodejs / promises

Differentiating Programmer Errors from Operational Errors #10

The Problem

Proposed Solutions

Recovery Object

`--abort-on-sync-rejection`

This Discussion