poelstra / ts-stream

Type-safe object streams with seamless support for backpressure, ending, and error handling
MIT License
65 stars 13 forks source link

If abort() called before reader attached, all writes fail; if abort() called while promise argument of a write() call has not resolved, that write will fail when it resolves #50

Open MattiasMartens opened 4 years ago

MattiasMartens commented 4 years ago

Wanted to flag this “gotcha” that I've encountered a few times.

Case 1 - calling abort() before calling something that attaches a reader to the stream:

(async () => {
    const s = new Stream();
    s.aborted().catch(() => ({})); // Swallow abort exception
    s.write(1); // This won't be pumped until toArray() is called below
    s.abort(); // The write above is now doomed!
    s.end();
    console.log(await s.toArray()); // Promise of s.write(1) will now reject as toArray() attempts to pull from aborted source
})();

Case 2 - calling abort() while the promise-like argument of write() has yet to resolve:

(async () => {
    const s = new Stream();
    s.aborted().catch(() => ({})); // Swallow abort exception
        const p = s.toArray(); // Attach reader before abort(), to avert case 1
    s.write(delay(2).then(() => 1)); // Wait 2 ms before resolving 1
    s.abort(); // No more writes past this point; but the promise generated by `delay(2).then(() => 1)` will try to write in the future, and fail
    s.end();
    console.log(await p);
})();

The first, to me, seems like a bug. write() was clearly called before abort().

The second one is less clear... According to the specification for abort(), this write should fail because it can no longer be processed downstream.

But it creates a problem for the source, which must not only detect abort() and stop writing, but also handle any unresolved promises it already passed downstream (presumably by catching the "aborted" errors from the corresponding write calls).

I don't really have a fix in mind for case 2. I considered simply ignoring promise resolutions in this case, but that violates the guarantee that writes must either succeed and pass the value to the reader or fail. I also considered modifying abort() to wait for resolution of promises sent to write(), but that would violate the expectation that abort() is supposed to be synchronous.

Maybe the best thing to do is to document the gotcha. I would also suggest that instead of post-abort writes throwing the abort error, they should throw a "WriteAfterAbortError" that wraps that error. This would make it easier to distinguish from the case where the source's aborted() rejection was not handled.

poelstra commented 4 years ago

Thanks for this analysis. I agree this may come as a bit surprising behavior, especially given that apparently it's not documented well enough.

The first, to me, seems like a bug. write() was clearly called before abort().

I wouldn't really consider it a bug in the behavior, but indeed more of a documentation bug. The idea is that every write indicates whether it has been processed or not. In this line of thinking it's correct to throw in this case.

Note that ts-stream is built around being able to trace a write from start to finish in a pipeline. This is in contrast to other stream implementations, which will typically buffer a number of items (perhaps at various stages of the pipeline, too).

But it creates a problem for the source, which must not only detect abort() and stop writing, but also handle any unresolved promises it already passed downstream (presumably by catching the "aborted" errors from the corresponding write calls).

IIRC, we ran into similar behavior around this ourselves years ago too, and yes, that's what you'll need to do then.

Maybe the best thing to do is to document the gotcha.

Agreed.

I would also suggest that instead of post-abort writes throwing the abort error, they should throw a "WriteAfterAbortError" that wraps that error. This would make it easier to distinguish from the case where the source's aborted() rejection was not handled.

That's a very good idea, and more in line with what happens when you write after an end.

MattiasMartens commented 4 years ago

"Promise wrapping" turns to be a nonstandard idea in JavaScript; are there any particular libraries you'd recommend for this purpose?

poelstra commented 4 years ago

I don't understand the question. As far as I can tell, the main 'fix' of this issue would be to improve the existing documentation.

For handling the write errors, the idea is to simply use any normal try/catch block, and just handle an abort error as you normally would.

MattiasMartens commented 4 years ago

This is regarding:

I would also suggest that instead of post-abort writes throwing the abort error, they should throw a "WriteAfterAbortError" that wraps that error.

This is technically a breaking change because it changes what error gets thrown, so I thought it might be one of the things you had in mind when you said that you'd prefer for me to merge any further breaking changes before bumping the ts-stream project version.

("WriteAfterAbortError" obviously doesn't need to wrap the original error that caused the abort, I just think it would be more expressive to do so.)

poelstra commented 4 years ago

Aaah, sorry, with "promise wrapping" I thought you wanted to wrap the resulting promise from a write call or something, because that's not what I meant.

I don't want to introduce any other dependencies (only for development/testing).

In this case, what I meant (and I thought you too) was that we'd create something like:

export class WriteAfterAbortError extends BaseError {
    public readonly abortError: Error;
    constructor(abortError: Error) {
        super("WriteAfterAbortError", "stream already aborted");
        this.abortError = abortError;
    }
}

And just use that to throw back at any writes still coming in, just like right now we throw back the abort error itself.

And yes, this is exactly the kind of stuff I had in mind about further breaking changes, I just forgot about this one. It appears you have a better view of all the 'pending stuff' than I have ;)

MattiasMartens commented 4 years ago

In this case, what I meant (and I thought you too) was that we'd create something like: ... And just use that to throw back at any writes still coming in, just like right now we throw back the abort error itself.

Yes, exactly. The only thing I would want in addition to this is the thing that came up in the discussion of #24, where the error also prints the stack trace of the wrapped error.

Quoting you from there:

Wrapping is possible, too. But to make that really helpful (i.e. if someone prints error.stack), you'll need to ensure that the stack of the wrapped error then includes the stack of the original error etc, which gets messy (especially given that ts-stream should work in node and all browsers). See BaseError to get an idea of the mess you'll get into.

I do think it's worth the pain to give the developer as much information as possible in the console.

You say you don't want to add any new runtime dependencies but would you be likely to support it if I rolled my own? I would base it very closely on one of the open-source projects that do the same thing, to get the advantage of their experience with cross-platform compatibility.

poelstra commented 4 years ago

I do think it's worth the pain to give the developer as much information as possible in the console.

Hehehe, "bring it on!" :muscle:

You say you don't want to add any new runtime dependencies but would you be likely to support it if I rolled my own? I would base it very closely on one of the open-source projects that do the same thing, to get the advantage of their experience with cross-platform compatibility.

Yes, but under a few conditions:

This stuff is not the 'core business' of ts-stream, and I don't want to blow up the whole library (including future maintenance burden) because of it. In that case, I'd rather stick to the simple proposal, above.

Can you point me to / show me the stuff you have in mind?

MattiasMartens commented 4 years ago

Came up with this: https://github.com/borisdiakur/rerror Which is itself a lighter version of this: https://github.com/joyent/node-verror

Haven't looked at the source or the open issues, but I know that both support Node and browser environments. Both are MIT license. And in either case we'll only need a small subset of the supported functionality, I believe.

This stuff is not the 'core business' of ts-stream, and I don't want to blow up the whole library (including future maintenance burden) because of it. In that case, I'd rather stick to the simple proposal, above.

Yes, fair. Best thing to do I think is for me to create a PR proposal addressing this feature, and leave it to your judgment if the solution adds too much complexity.

poelstra commented 4 years ago

I looked at both, they seem close (especially VError), but indeed do more than we need. I also had another look at the existing BaseError, and I think it actually provides the basis we need: a reliably inheritance chain, and a 'guaranteed' stack property.

So armed with that, I suppose this is actually the only thing we really need:

export class WrappedError extends BaseError {
    public readonly cause: Error;

    constructor(name: string, message: string, cause: Error) {
        super(name, message);
        this.cause = cause;
        this.stack =
            this.stack + `\ncaused by: ` + (cause.stack ?? `    ${cause}`);
    }
}

export class UnhandledAborterError extends WrappedError {
    constructor(cause: Error) {
        super("UnhandledAborterError", "aborter callback failed", cause);
    }
}

Usage:

const cause = new TypeError("woopsie");
const wrapped = new UnhandledAborterError(cause);
console.log(wrapped.stack);

Output:

UnhandledAborterError: aborter callback failed
    at repl:1:5
    at Script.runInThisContext (vm.js:116:20)
    at REPLServer.defaultEval (repl.js:405:29)
    at bound (domain.js:419:14)
    at REPLServer.runBound [as eval] (domain.js:432:12)
    at REPLServer.onLine (repl.js:716:10)
    at REPLServer.emit (events.js:228:7)
    at REPLServer.EventEmitter.emit (domain.js:475:20)
    at REPLServer.Interface._onLine (readline.js:315:10)
    at REPLServer.Interface._line (readline.js:692:8)
caused by: TypeError: woopsie
    at repl:1:11
    at Script.runInThisContext (vm.js:116:20)
    at REPLServer.defaultEval (repl.js:405:29)
    at bound (domain.js:419:14)
    at REPLServer.runBound [as eval] (domain.js:432:12)
    at REPLServer.onLine (repl.js:716:10)
    at REPLServer.emit (events.js:228:7)
    at REPLServer.EventEmitter.emit (domain.js:475:20)
    at REPLServer.Interface._onLine (readline.js:315:10)
    at REPLServer.Interface._line (readline.js:692:8)

But I just whipped this up really quick, so perhaps this is a bit too simple still.

MattiasMartens commented 4 years ago

I haven't had a chance to take a close look at this yet, but I do want to flag one thing: when the environment supports it, I think we should defer evaluating the stack trace as long as possible (so implement a custom getter for WrappedError, as those libraries do).

I profiled these two statements:

("stack" in new Error("TEST"));

and

new Error("TEST").stack;

The first one is 30x faster because it does not trigger an evaluation of the stack. (As you know, in many cases the stack doesn't have to be evaluated at all.)

poelstra commented 4 years ago

Yeah, I know about that performance difference. As I mentioned, I made something similar before.

But it's exactly that property getter stuff for Error.stack that's making things tricky. For example, there's no way to 'forward' it to the original getter. There are probably other ways (such as creating another Error just to capture its stack later, instead of our own one), but all of that is probably going to be quite tricky to test across all browsers, including with different downlevel compilation options.

Also, I think the performance concern isn't really important here. We're not building a generic promise library, and we're also not using them for every single error that's passing through. I suppose that we'll only be using them for WriteAfterAbortError and UnhandledAbortError. Both of these will be (very) rare occurrences.

MattiasMartens commented 4 years ago

Yeah, I know about that performance difference.

Right, of course you would, having worked on an error library! Apologies Martin.

Makes sense. It's important to think about in cases where errors are thrown and caught in a tight loop, but this is not going to be one of those cases.