Closed benjamingr closed 9 years ago
@benjamingr Do you have an alternate strawman to propose? It doesn't have to be fleshed out, but it would help to have something concrete to discuss.
Not anything concrete, but for a start - what about an RxJS like onNext
onCompleted
onError
interface where the return value is a subscription that can be disposed?
The main reason I asked for @headinthebox to help is because I believe he can come up with something better based on his horizontal experience with observables that spans languages, type-systems and ecosystems.
(Of course, this does not mean generators couldn't be passed in)
IMO it should not be a goal to be able to pass in generator objects. The awkward priming they require, not to mention the "infinite" loop, is a sign of a pretty major impedance mismatch.
And as explained in #14, the onNext/onCompleted/onError are really just a dual of an iterator's next()
; confusing things by matching the related-but-separate generator interface makes people think observers have something to do with generators, when they do not. Picking less punny names would make this clearer.
You can always write an adapter, if you really want to program that way. I think a function that primes and translates next/return/throw to onNext/onCompleted/onError is no worse than a function that primes.
The awkward priming they require, not to mention the "infinite" loop, is a sign of a pretty major impedance mismatch.
But that's just standard fare for any data sink that you write using a generator function. Are you suggesting that developers shouldn't write data sinks with generator functions?
confusing things by matching the related-but-separate generator interface makes people think observers have something to do with generators, when they do not
I must say that I disagree here. If you introduced onNext
, onCompleted
, and onError
, they would have the exact same semantics as next
, return
, and throw
do for data sinks implemented with generator functions. (I suppose that you could quibble over the return value of the methods, though.)
Again, should programmers not use generator functions as data sinks?
@domenic
not to mention the "infinite" loop, is a sign of a pretty major impedance mismatch.
I want to point out that I see nothing wrong with code that has while(true){
in generators, it's pretty common in languages that sport generators like C#, Python and PHP. It's practically the same as a daemon thread yielding (which I believe is why it's called yield in the first place).
I'm not against the idea of generators as sinks (or even as transforms).
@zenparsing
I must say that I disagree here. If you introduced onNext, onCompleted, and onError, they would have the exact same semantics as next, return, and throw do for data sinks implemented with generator functions.
To be clear - This is a really strong argument and a generator observer is a really clever idea (perhaps too clever).
There are a few issues with it we've uncovered in this repo:
return
on it and not for a generator itself as a consumer. Since our object represents the observer nothing exists in the iteration protocol. return
with no value. When you return
from a generator you can return a value back and return
can take a value as its parameter.yield
and what's being returned).The fact a subscription ends with finally
blocks running means I in fact need to write the example shown as:
commandKeys(inputElement).subscribe({
next(value) { console.log("Recieved key command: " + value) },
throw(error) { console.log("Recieved an error: " + error) },
return() { console.log("Stream complete") },
});
With "generator syntax" as something like:
commandKeys(inputElement).subscribe(() => {
function* consumer(){
try{
while(true){
try {
console.log("Received a value: " + (yield));
} catch(err){
console.log("Received an error: + err);
}
}
} finally {
console.log("Stream complete");
}
};
let observer = consumer();
observer.next(); // "prime";
return observer;
})());
This isn't a very ergonomic API and I can see some footguns here. We're not using the return value in the generator and we're not passing it any arguments - both things might help simplify the API.
As a somewhat "odd take" on duality, observers can be functions that operate with the generator API and do the priming. That way they could take a dispose
argument (just like new Observable's subscriber returns a disposing function). The function would look like this:
myObservable().subscribe(function*(dispose){
// because of "priming", we can actually extract dispose here to the outer scope if we want to
// we no longer have to do any priming in user code any more, we have nifty cancelation we can
// compose with refcounting and subscribe can return a promise. This is also safe.
});
This is somewhat problematic for things that are not generators:
myObservable().subscribe(dispose => unprime({
next(value) { console.log("Got value!", value); },
throw(error) { console.log("Recieved an error: " + error) },
return() { console.log("Stream complete") },
}));
Which would mean that that we're against at some odds with the generator interface (because of the unprime). Alternatives include overloading subscribe (footgun imo) or using a non generator type. Between the two I tend to favour the non-generator interface type. I don't have a solid solution yet - but I think we should be able to reconcile. I'd even prefer having two functions (like we do now with forEach
only that one is for passing a generator and the other is for passing handlers without priming).
(I suppose that you could quibble over the return value of the methods, though.)
Or stop claiming duality, that'd also work, just want to put that out there. You clearly have your own thing here. As soon as we started discussing duality things got really hand wavy very fast. We can just say that our "extended" generators are not dual to the iterator protocol and that it might change if the iteration protocol is supplemented in the future.
I think dualization is an important and teaching exercise but I don't understand why it should be claimed or a goal in the final API.
If we add the notion of explicit resource disposal to generators they are capable of being used for Iteration, Observation, and cooperative computation. The main reason we are having this conversation is that I made a mistake in ES 2015.
First a little history. Near the end of the ES2015 spec process I proposed adding the return method to generator. The goal was to provide a mechanism for unmanaged resource disposal. A return method seemed like a good solution, because it was the only missing bi-directional semantic on the generator (throw already existed).
The proposal was accepted and today when you early exit from a for...loop generator.return() is invoked with no input value and the return value is ignored.
for(let x of cas) {
if (x == 3) {
break;
}
}
...translates to...
vat iter = xs[Symbol.iterator]();
var iterResult;
while(!(iterResult = iter.next()).done) {
if (x == 3) {
iter.return();
break;
}
}
This allows the producer to clean up unmanaged resources like file handles in the event that a consumer decides they don't want to wait for iteration to complete.
This strongly suggest that we should see the same behavior if a consumer decides they don't want to wait for observation to complete. In other words...
var sub = observable.subscribe({
next(v) {
console.log(v);
}
throw(e) {
console.error(e);
}
return(v) {
console.log("done", v);
}
});
sub.unsubscribe(); // prints "done"
Unfortunately this behavior would mean that consumers can't differentiate between normal stream termination and a short-circuit. The ability to differentiate between these cases is valuable and is common in user land implementations of Observable. Why does it seem like we should do something else?
Simple: We incorrectly overloaded the generator's return method to handle both explicit resource disposal and coroutine termination. This was a mistake because each of these semantics imply very different behavior. So how did this happen?
When I proposed the method return(), like throw(), it had no return value. However later the signature of return() and throw() was expanded. Dave Herman proposed that return() and throw() be modified to produce an IterationResult for a very good reason: It turned out that neither return nor throw guaranteed unmanaged resource disposal. Why not? Because yield can appear in a finally blocks.
function *() {
try {
var fs = open("file");
yield fs.read();
}
finally {
yield 1;
fs.close();
}
}
var generator = f();
var iterResult = generator.next();
console.log(iterResult.return(5).done);
// prints "false"
Unfortunately I didn't notice at the time that this semantic drift made return() unsuitable for unmanaged resource disposal. When you call dispose, objects don't get to say "No thanks." They are expected to clean up their unmanaged resources. However when calling return on a generator, it turns out to be completely reasonable for the generator to say "no thanks." It is merely a request that can be rejected by the callee.
Trying to use return for both explicit resource disposal and cooperative computation was a mistake. While this mistake can likely not be rectified for Iteration, it can and should be rectified for Observation.
I propose we add a dispose method to generators. This method can be invoked in the case of early termination, as it should have been in the case of breaking from a for...of loop. If we have a dispose method we will have no more use for the Subscription object. We could invoke the dispose method on the Generator itself.
I propose the following API:
var decoratedGenerator = observable[Symbol.observer]({
return(v) {
console.log("stream completed");
}
dispose() {
console.log("early termination");
}
});
decoratedGenerator.dispose(); // prints "early termination"
I mean, only Firefox implements return
so far, and I don't think it's even complete. If you think it's a mistake, might as well advocate for changing it.
I would really love to. Unfortunately I think it's too big an issue for errata, but I'm willing to try. Frankly I'm pretty broken up about it. I had all the context I needed to determine this was a mistake and it fell off my radar. Mea culpa.
This is precisely what I was talking about in the other thread. There's no way to for a consumer to signal an iterable to clean up a scarce resource without signaling either success or failure.
@jhusain thanks for the context. I didn't even realise yield
in finally
could cause this problem. I opened an issue https://github.com/tc39/ecmascript-asyncawait/issues/51 so this case can be considered in async/await I'm rooting for your attempt to fix generators.
Still, I'm not sure observers should use the "generator interface"* directly. There are still issues with it - in fact you've made an argument against using generators:
yield
after return is called on it? The whole "finally means onCompleted" semantic as you pointed out is problematic, and I'd argue it's confusing too.(*) - can we please find a better name for this "generator interface"? It's not something generators have, it's something the return result of generators have.
For what it's worth, I talked to @bwoebi from PHP internals, they've made the choice to throw
on yield being called in finally when a generator is force closed - http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_def.h#7367 - I wonder if this can be amended in ES.
I also talked to @petkaantonov - in bluebird coroutines a yield
in a finally blocks will also trigger a runtime error if the coroutine has been cancelled.
I think we should all step back and read this mega-thread https://github.com/dotnet/roslyn/issues/261 apparently somewhere between the top and 3/4 of the way down they start getting into async cancelation semantics. I bet there is a lot to learn there, once I make the time to read it.
@domenic Thanks for the link! Just started scanning it, but there's some interesting stuff in there around cancellation tokens and async for each.
@jhusain doesn't that "mistake" only pertain to the specific use case of observables?
e.g. in the case when implementing async functions on top of generators for example, it would be perfectly fine. There .return
-the-method (1) would mean the promise was cancelled. The cleanup inside finally can be allowed to await
(yield) other operations. These can be awaited, but that wont affect the end result, which is a cancelled promise.
For observables, its bad. There .return
-the-method would be a mechanism for the observable to tell the observer there are no more values. Inside the generator this causes the finally block to be called. yield
within that block means "wait for the next value". Indeed that makes no sense, as it was already told that are no more values. But that seems like an error in that particular use case, not a design flaw?
(1) I try to make a distinction between .return
-the-method and .return
-the-keyword when discussing observables. Seems to me like they have vastly different meanings?
.return
method is a mechanism for the observable to tell the observer there are no more valuesreturn
keyword within a generator based observer is a mechanism for it to tell the observable that it doesn't care about any more values (unsubscribe). Which unfortunately means you can't tell a generator-based observer to stop caring from the outside, as it must decide to return
by itself (and call the appropriate cleanup code before doing that, if cleanup is necessary)The return method means the same thing as the return key word: the stream of data has completed with a final result. The producer has no more values to provide. However the consumer may decide to wait for more value and not shut down, which is the consumers prerogative. If the producer is uni-directional, this will be a bad idea and should be considered user error. Should this occur there is a pretty clear remedy: don't do that. However if you absolutely need deterministic dispose semantics, you can always fall back on writing a generator by hand and implementing the dispose method. Later on we can always add language semantics for dispose so they can be handled in generator functions. The simplest option would be to add a dispose block { } alongside finally { } blocks.
@jhusain - what about a function that returns a generator-interface-iterable (+dispose) as the base type? Wouldn't that make more sense for the observer? Then you can pass a dispose
in which would take a function and get invoked whenever deterministic clean up is required. It also takes care of priming and such.
Somewhat unrelated but I am playing with this at the moment and agree that priming generators is a little gross. I'm not sure if this is a good idea to include in the proposal but it would be nice if generators were automatically primed using something like:
var GeneratorIterator = (function*(){}()).constructor;
var primeGenerator = function(generator) {
if (generator instanceof GeneratorIterator) {
generator.next();
}
return generator;
};
@DylanPiercey Thanks for the suggestion! I think, though, that would be a little too hacky for the standard library. For instance, what if we actually wanted to pass in a generator that was already primed? Would we then re-prime it?
I agree that priming seems weird at first. But that's just how generator-functions-as-sinks work in JS. And once you get used to using a library function for priming (or some other strategy) you get used to it and it's not a big deal. I disagree with others that have said it's a footgun.
@zenparsing I do agree it's a bit hacky. Truthfully I thought that when invoking a generator function that it would run until the first yield or return and was a little shocked that invoking a generator does nothing until next is called. (Most of my work with generators is on the producer end),
It probably is best that priming is left to an external function (or done manually) but it sure caught me off guard.
I disagree with others that have said it's a footgun.
So you don't see how developers will forget to prime?
What happens when you pass an unprimed generator in?
Even adding a .started
property on a returned iteration from a generator would be a better solution to this.
What happens when you pass an unprimed generator in?
Nothing, because it's a function that creates a generator, right?... unless that function happens to have next
, throw
, return
and dispose
monkey-patched on it, I guess.
I meant an invoked generator (the sequence) that has not been "primed"
On Fri, Jul 3, 2015 at 1:40 AM, Ben Lesh notifications@github.com wrote:
What happens when you pass an unprimed generator in?
Nothing, because it's a function that creates a generator, right?... unless that function happens to have next, throw, return and dispose monkey-patched on it, I guess.
— Reply to this email directly or view it on GitHub https://github.com/zenparsing/es-observable/issues/33#issuecomment-118186149 .
Subscribe should optionally accept a generator as its first argument.
Sent from my iPad
On Jul 2, 2015, at 3:40 PM, Ben Lesh notifications@github.com wrote:
What happens when you pass an unprimed generator in?
Nothing, because it's a function that creates a generator, right?... unless that function happens to have next, throw, return and dispose monkey-patched on it, I guess.
— Reply to this email directly or view it on GitHub.
@jhusain I think this is the best option in terms of usability. (Although i'm not sure what the difference would be from automatically priming).
We could automatically prime the generator, keeping track of the fact that it's not already primed by attaching a symbol when we prime.
Actually scratch that. We couldn't discern whether a generator was produced by JavaScript or created manually. That is unless we change generator functions to attach a Symbol to generators they create and ES 2016, which is actually possible.
That would totally make sense to me. Attaching a symbol or keeping it in a
WeakMap
and prime it if it's not already primed. Accepting a generator
would also go a long way in terms of usability.
This would only be problematic in case someone "manually primed" a generator and passed it in - in which case we'd prime it again.
On Fri, Jul 3, 2015 at 1:45 AM, Jafar Husain notifications@github.com wrote:
We could automatically prime the generator, keeping track of the fact that it's not already primed by attaching a symbol when we prime.
— Reply to this email directly or view it on GitHub https://github.com/zenparsing/es-observable/issues/33#issuecomment-118186929 .
@jhusain what about a generator started
property? That way we'd know if an iteration started, users can hook their own iterators to work with it and we can prime.
this also inverses nicely to observables where it could tell you if an observable has been subscribed to.
@zenparsing
@DylanPiercey Thanks for the suggestion! I think, though, that would be a little too hacky for the standard library. For instance, what if we actually wanted to pass in a generator that was already primed? Would we then re-prime it?
The standard library can just check if it has the right prototype. As for already primed, the standard library can store primed generators in a WeakMap
and not prime them again.
That said, I like the direction in @jhusain 's comment above better.
@jhusain we can differentiate generator functions from generators from everything else:
GeneratorFunction = (function * () {}).constructor
Generator = (function * () {}()).constructor
myGen = function * () { yield 1 };
myGen instanceof GeneratorFunction; // true
myGen() instanceOf Generator; // true
However i'm not sure how to test if it has been primed.
Closing in favour of #35 to keep discussion there if no one objects. Feel free to reopen.
Continuing the discussion from https://github.com/zenparsing/es-observable/issues/14 I think that while the idea of using the generator interface as a dual to iterator is novel, there is really no justification to do so.
We see that using it as the basis for the observer does not really maintain duality particularly well (as seen in #14) nor it provides the semantics required for useful observers (as seen in #32 as well as several other issues). Now that I understand the proposal a lot better:
Wouldn't it be better if we just dropped using the ES2015 generator interface for the observer?
On the same note - wouldn't it be better to use more descriptive names for observers (like the names used in Rx, I'm quite fond of those) and not stay bound to the interface allowing us to add cancellation semantics as we please easily?
I think duality is important, but leveraging the experience with the API of people like @jhusain and @blesh who actually used observables for a great extent is just as important if not more. I'm certain we can prove duality with a better named API too.
I'm well aware that a lot of people put a lot of work into the proposal, work by people who probably have more experience and shinier PhDs than me, you have discussed and uncovered a lot of interesting issues and I don't like delaying the proposal as much as anyone else - I definitely think though that this is worth discussing.