Error conditions, retry, and Error.cause

jasnell commented 2 years ago

Within Workers we have been having a discussion about how to communicate to users via Errors that the conditions leading to an error are temporary and that the user should retry their operation. The how and when to retry is not important here.

For example, a fetch() promise can fail for many reasons. The network path could temporarily be down, the URL could be blocked, the header could be malformated, etc. We want to be able to clearly indicate that the user can/should retry their operation without requiring that the user resort to parsing the error message.

We have several possible paths forward, all of which have the same fundamental problem. We'd like to get consensus on which approach folks would find the most agreeable.

Option 1: New error types

const err = new Error('an error occurred');
Object.defineProperty(err, 'name', { value: 'RetriableError' });

Option 2: Non-standard own properties on `Error`

const err = new Error('an error occurred');
err.retriable = true;

Option 3: Using `cause`

const err = new Error('an error occured', { cause: { retriable: true } })

Option 4: Using `AggregateError`

// The first object is always an error but the additional things communicate
// the additional structured information we want.
const err = new AggregateError([
  new Error('an error occurred'),
  { retriable: true }
])

Option 5: ??

Other ideas?

Current Thinking

My current thinking here is to prefer Option 3, using the cause property.

Specifically, pulling out to a logical level: The purpose of the cause is to communicate the reason for this error. That reason might be that another Error was thrown, or it might be that some other condition occurred. For instance, the network was down, or there was an internal error, etc. So let's differentiate between Error and Condition.

If I have a transient condition and want to communicate that the user should retry their operation, then I could logically do something like:

cont condition = {
  // The condition is temporary....
  transient: true,
  // The operation is retriable...
  retriable: true,
};
const err = new Error('oops that failed', { cause: condition });

The challenge with this, of course, is interoperability. If workers chooses to use cause in this way but other fetch() implementations choose to use cause in other ways then we can run into interop issues. To be clear, ALL of the options suffer from this exact problem.

Proposal

The proposal I would like to make is to define a new ErrorCondition interface specifically for use with cause

Essentially (treat this as a discussion example to express intent... the actual proposal can be refined):

dictionary ErrorConditionInit {
  boolean transient = false;
  boolean retriable = false;
  DOMString name = "";
};

interface ErrorCondition {
  constructor(optional DOMString message = "", optional ConditionInit init = {});
  readonly attribute boolean transient;
  readonly attribute boolean retriable;
  readonly attribute DOMString name;
  readonly attribute DOMString message;
}

Note that this interface intentionally mimics DOMException with the inclusion of a name and message accessors.

Example use (assuming the proposal to add cause to DOMException goes through):

const err = new DOMException('The operation failed', {
  name: 'NETWORK_ERR',
  cause: new ErrorCondition('The network path is down', {
    transient: true,
    retriable: true,
  })
});

console.log(err.cause.transient);  // true
console.log(err.cause.retriable); // true

To be clear, I don't really have strong opinions on exactly how we solve this use case. My only requirement is that we have a mechanism for reliably communicating transient/retriable conditions that is interoperable across runtimes.

Some questions

How are retriable errors like this handled elsewhere on the web?

jasnell commented 2 years ago

cc @domenic @annevk ... We'd love your input on this topic as well.

jasnell commented 2 years ago

Also... @kentonv @mcollina

benjamingr commented 2 years ago

An error subclass seems like the natural/standard way to communicate a finer grained error, then that can have a retriable property or whatnot.

I always thought of .cause as the error that caused that error to help investigate errors deeply (e.g. if your service that depends on an API failed, it might be useful to know that API returned a 503 status when debugging even if it's an error related to your business model)

jasnell commented 2 years ago

as the error that caused that error

But what if it's not an Error that caused the error? Remember, JavaScript allows you to throw anything, and some Errors are not caused by other errors as much as they are caused by conditions that are not met (the network being down is not an error on it's own, it's just a temporary state).

ljharb commented 2 years ago

Note that this (examples of a non-error object) includes any Promise rejections, like reject() in a Promise executor, Promise.reject() anywhere, or any throw object.typo variant that might appear.

domenic commented 2 years ago

Yeah, my understanding was cause was mainly aimed at stack stitching of multiple Error instances. It's not a generic way to smuggle in more data into a single field; creating actual fields on a subclass or expando seems better for that.

ljharb commented 2 years ago

That's not the motivation as I understand it; that's what AggregateError is more aimed at.

jasnell commented 2 years ago

Ok, so the option 5 here is something like:

class RetriableError extends Error {
  constructor(message) {
    super(message);
    Object.defineProperty(this, 'name', { value: 'RetriableError' });
  }
}

jasnell commented 2 years ago

@domenic ... if we go the route of a new Error subclass, I'd like to make sure we can do it in a way that ensures we won't clash down the road with any web platform stuff. To that end, do you think there'd be any interest among browser implementors for either a new error type or standardized property on DOMException to indicate retriability?

domenic commented 2 years ago

I'm not sure. My initial impression is that it seems rather difficult because it's ambitious. If I were to go through the hundreds of thrown exceptions on the web platform, I'd be surprised if they neatly fell into "retriable" and "non-retriable" buckets. E.g. for fetch() we purposefully avoid giving much programmatic access to information about network errors, for security reasons. (But maybe a single retriability bit would not be a security leak?)

The places that seem most promising are I/O specs like fetch(), IndexedDB, and fs.spec.whatwg.org. Maybe file issues on those repos and see if this is a problem web developers have, and if they'd benefit from such a bit? And then security discussions could be had there too.

jasnell commented 2 years ago

I'll get an issue opened in the webidl repo to start the conversation.

legendecas commented 2 years ago

To my understanding, I'd find this feature would be better served with option 1 or option 2, as the additional information is describing the error instance itself, rather than their "correlated" error instances. For option 2, "Non-standard own properties" may be standardized. If this "retriable" bit doesn't impose security concerns, I'd believe it could be a benefit to other web APIs too.

Yeah, my understanding was cause was mainly aimed at stack stitching of multiple Error instances.

That's not the motivation as I understand it; that's what AggregateError is more aimed at.

IMO cause was intended for augmenting errors with correlated contextual information. It is not like AggregateError aggregating arbitrary errors, which can be totally unrelated.

That being said, I think option 1 (or subclasses) and option 2 are valuable to be explored.

mcollina commented 2 years ago

I think using cause would cause compatibility problems with whoever is expecting an error to be there. I would

In case of an added property, I recommend using a global symbol name as a property, as everybody has been stitching properties on errors for years. I think you'd want to avoid naming collisions. Something like err[Symbol.for('errors.meta)] = { retriable: true }.

This would be a generalization beyond fetch.

benjamingr commented 2 years ago

But what if it's not an Error that caused the error?

Then I would not expect .cause to be set but instead additional/different properties to be set on the error subclass. This is similar to the DOMException subclass having a .code (or Node errors, similarly). The way to add properties to errors is very common (e.g. in the context of HTTP requests, axios - its Errors expose a reference to the response).

Remember, JavaScript allows you to throw anything, and some Errors are not caused by other errors as much as they are caused by conditions that are not met (the network being down is not an error on it's own, it's just a temporary state).

Sure and while I don't personally like non Errors thrown because they make debugging harder - I would absolutely expect the .cause of an error that resulted from another error to contain that other error even if that error itself isn't an Error (e.g. if it's a string).

I would also be interested in hearing more about your use case or where the platform can decide for the application whether something is retriable or not.

I guess the closest we have is EAGAIN as a code in some cases.

jasnell commented 2 years ago

I would also be interested in hearing more about your use case or where the platform can decide for the application whether something is retriable or not..... I guess the closest we have is EAGAIN as a code in some cases.

EAGAIN is a good example. A number of 4xx http status codes are also examples.

Ok, from the discussion here, it's looking like the preferred choices are either (a) create a new Error type or (b) define a new property on the Error. Out of those two choices, do folks have a preference?

ljharb commented 2 years ago

@benjamingr

I would absolutely expect the .cause of an error that resulted from another error to contain that other error even if that error itself isn't an Error (e.g. if it's a string).

while that’s certainly an option, the design of the feature, as well as the idioms of the language itself, was that a thrown value may be set as a cause unchanged, just like try/catch and Promises don’t wrap a non-error value in an error.

ljharb commented 2 years ago

I’d prefer a new error type; its semantically clearer to me versus slapping random extras onto another error.

kentonv commented 2 years ago

IMO a new error type doesn't work here, because the retriable property is potentially orthogonal to the error type, and JS is single-inheritance. The problem here is a bit more obvious if we consider what happens when we have two different properties we want to express. For example, let's say we both want to express "makes sense to retry" and "client error vs. server error" as properties. The only way to cover all cases via types is to have four different types: RetriableClientError, RetriableServerError, PermanentClientError, PermanentServerError. This is obviously not a great direction. If we had multiple inheritance, then we could have error types that inherit e.g. from both RetriableError and and ServerError to indicate both properties through types, but we don't have that.

So in my opinion it has to be a property, not a type.

ljharb commented 2 years ago

It could still be one type that has multiple properties, or a single property that's an array of enum values, or something.

kentonv commented 2 years ago

@ljharb Hmm, so you mean like... we introduce some sort of WcgPlatformError (better name TBD) which has to be used for all errors that want to be retriable? I guess we could do that, it feels like a perversion of the purpose of typed exceptions, but I think typed exceptions don't have a great track record to begin with so maybe that's fine.

ljharb commented 2 years ago

Yep, exactly that.

kentonv commented 2 years ago

I guess the big question then is will we run into any cases where we want to throw an error marked retriable, but existing specs say that we need to throw a specific type of error (like DOMException)? If so, what do we do? Or can we definitively rule that out as a potential problem?

ljharb commented 2 years ago

I suspect if this type inherits from DOMException, and doesn't add anything that conflicts, that it'd be fine since WebIDL is more about interfaces than nominal types.

annevk commented 2 years ago

That does seem like a problem. What if DOMException gets extended or the specification gets updated to throw a subclass of a different type?

benjamingr commented 2 years ago

That does seem like a problem. What if DOMException gets extended or the specification gets updated to throw a subclass of a different type?

I'm not sure I understand - if the DOM specification changes to throw a different error then implementors would presumably subclass that.

annevk commented 2 years ago

Right, but you can end up with clashes. Also risks reducing the ability for code to be reusable.

ljharb commented 2 years ago

@annevk true. The only real ergonomic solution to avoid clashes is for the web to commit to carving out ways it won’t ever conflict, so other runtimes have space to innovate freely.

jasnell commented 2 years ago

As I pointed out in the original post, every option here runs the risk of clashes. What I plan to do on Monday is open an issue in whatwg/fetch to propose either a new property on DOMException to indicate retriability, or introduce a new error type for that purpose and see if we can get consensus among browser developers. Failing that, I'd like to bring the question to tc39. In the meantime, for our immediate use case we'll likely just have to do something fairly ad hoc accepting the risk of possible conflicts later.

The rather silly thing about not using cause intentionally for a purpose like this is I could have just as easily done it like throw { retriable: true } and throwing that into a cause would be a perfectly acceptable thing to do.

For instance...

try {
  throw { retriable: true };
} catch (cause) {
  // Falls perfectly within the expected use of cause described in this thread
  throw new Error('boom', { cause });
}

Which is only superficially different from:

try {
  throw new RetriableError();
} catch (cause) {
  throw new Error('boom', { cause });
}

As for a couple of the specific points, @kentonv says:

I guess the big question then is will we run into any cases where we want to throw an error marked retriable, but existing specs say that we need to throw a specific type of error (like DOMException)? If so, what do we do? Or can we definitively rule that out as a potential problem?

Again, that's what cause is for correct?

try {
  throw new MyRetriableError();
} catch (err) {
  // Wrap in a DOMException to meet the API expectations
  throw new DOMException('Network failed', { cause: err, name: 'NETWORK_ERR' });
}

So in my opinion it has to be a property, not a type.

Can be, doesn't have to be. A property also has potential issues if not implemented consistently (insert reference to the current debate around adding cause to DOMException here). The property approach likely has fewer tradeoffs, however, and is potentially the better option. With that in mind, what kind of value are we thinking here? Just a simply retriable: true|false? Or something more structured? And is this a new property on DOMException? Any intrinsic Error?

domenic commented 2 years ago

to commit to carving out ways it won’t ever conflict,

This seems very doable. If you use a prefix like $$wintercg, I am personally willing to guarantee the web won't create any classes or properties prefixed by such sigils.

ljharb commented 2 years ago

I did say “ergonomic” :-)

wintercg / fetch