Implementation options for ES2015 Promise support

fatcerberus commented 8 years ago

There are some interesting challenges for an Ecmascript engine to support Promises.

First and foremost, settlement (fulfill/reject) must be be asynchronous: When a Promise is resolved, the handler is not called immediately but instead gets queued into what ES6 refers to as a Job Queue. To support this there must be an event loop of some kind. Duktape currently has no such thing.
Promises in a promise chain are quite tightly linked such that the resolution of an earlier promise directly begins the process of resolving the next. A typical JavaScript implementation will make heavy use of closures to support this, a luxury C code doesn't have. Some kind of solution using internal properties is needed instead, which is awkward and if not done carefully could expand footprint unnecessarily.
In contrast with exceptions thrown in synchronous code, what to do with uncaught promise rejections is not as obvious. Handlers need not be attached to a promise chain immediately and in fact could be added at any time before or after the promise is settled. Therefore it's not enough for the engine just to say, "Nobody will handle this rejection, ergo uncaught" and crash. Complicating matters further is that, if it turns out that no rejection handler is ever attached, all information about the error--or even that there was ever an error in the first place--will be lost. Indeed, because promise settlement is performed asynchronously and (nominally) in a separate execution environment, there is no way to handle an error thrown by a promise except in a rejection handler. Some potential solutions to this have been devised in third-party Promise implementations, e.g. Promise.prototype.done(), but these have their own issues and are sometimes controversial. For example bluebird heavily discourages use of .done(): http://bluebirdjs.com/docs/api/done.html

This issue is for discussion of the above challenges and the best options for solving them in a small-footprint engine such as Duktape.

fatcerberus commented 8 years ago

Having put some thought into it, the async requirement for Promises makes sense, particularly in a language with closures. I initially thought it might just be to prevent excessive recursion, since a promise chain is really just a callback pyramid in disguise and it would be easy to run out of stack space. But that's only a part of it I think. If a promise is handled inline--especially if it's handled before a function containing it returns--then any one of the closures in a .then() clause could unexpectedly modify the containing function's state, leading to very strange behavior.

I think this is why Ecmascript mandates a completely empty callstack before settling a promise too - doing this ensures that no program state can be disturbed in the settlement process except for only what's in use by the promise itself.

svaarala commented 8 years ago

There are multiple reasons to "pump" events via an event queue. Call stack is one, and predictability of when the callbacks happen (and won't happen) is another. For example, it's annoying if a callback happens immediately and you haven't set up some local variables which the callback sees.

I think you need to read that requirement in the context of what the specification is trying to prevent: application visible consequences of promise callbacks being called earlier than expected. The important thing is that all the calls originate from a state where none of the application code is running.

But if there is a Duktape/C or even an Ecmascript function implementing an event queue pump outside of application code, there's no way for standard ES6 code to know that loop exists. It will be visible in stack traces but those are not standard behavior. The application code won't see any callbacks happening earlier than expected.

So implementing promises like that would be OK as far as practical compliance is concerned I think. And it's quite normal to do so, when you implement an application container providing a scheduler / eventloop / promises.

But I'm not sure how this requirement matters much? Promises must be queued up for execution somehow, and Duktape shouldn't provide "The Eventloop" but rather provide a mechanism to process futures (incrementally or all pending ones at a time; I'd prefer incrementally) from a user-provided eventloop?

svaarala commented 8 years ago

So assuming that the implementation is some kind of API call like:

duk_process_pending_event(ctx);

the event callbacks (promises here) would in fact be executed "from the top level".

Or at least it would be possible to do so; it would be up to the user to circumvent this and call it from inside a pure Ecmascript eventloop (via a native binding) for example - and I think that should be fine because it's quite normal to write even top level eventloops as scripts. Such an eventloop could then alternate between sleeping, polling whatever custom state there is, and processing pending events (promises).

Practical code should probably wrap that event pump call into a safe call at least. Or the errors should be eaten and discarded silently (which seems bad to me).

svaarala commented 8 years ago

... As an amusing coincidence, I'm just debugging a callback sequence (implemented without promises) that runs into callbacks in some rare cases being executed earlier than expected... Highly annoying :grin:

svaarala commented 8 years ago

Some high level notes on event loop approaches:

Externalize the event loop mechanism somehow so that Duktape won't implement an event queue or dispatcher but let application code do it. The interface(s) needed are to queue pending callbacks etc.
Implement an event queue in Duktape and provide a C and/or Ecmascript API for application eventloop to interact with it. Application code should be able to queue up their own events to be dispatched similarly to how Promises are dispatched. This is quite useful because an event queue is required for implementing web-like bindings. So this is my favorite so far.
Implement an actual eventloop in Duktape. I don't think this is feasible because the portability concerns would be way too big.

Regarding the event queue, the proper scope for an event queue is most likely a global object? So, when one requests events to be dispatched:

The request could be directed at the Duktape heap, with Duktape dispatching pending events for some global object, one at a time. This may be rather awkward to implement.
The request could be directed at a specific global object environment (identified using the argument ctx). Duktape would dispatch event(s) from that global environment alone, and it would be up to the application code to know what global environments need event handling.

One thing that came to mind is what should happen if user code does something that queues up events (perhaps accidentally) but doesn't implement calls to cause the events to be handled?

Should promises fail to work unless user code has somehow indicated that it has event queue handling support? For example an API call like duk_enable_events() which would set up the event queue, thus signalling that it's OK to rely on events being handled?
If not, should there be some sanity mechanism to let the user know events are queueing up and eating memory?

The scenario I'm thinking about is user code including a bunch of modules (perhaps not knowing what's in them), the modules creating a bunch of Promises which are not really of interest, with promise events accidentally queued up and never dispatched. It'd be nice if it would be difficult to make this mistake, e.g. by promise creation failing unless event support has been signaled.

fatcerberus commented 8 years ago

It should be noted that ES6 does mandate an event queue - the Job Queue, and states explicitly that Jobs (of which Promise settlement is one type) must not run until the Ecmascript call stack is completely empty. While a non-empty callstack is not directly detectable by standards-based code, it can be visible indirectly through side effects, so the distinction is more than just academic.

As I've mentioned before, this kind of setup is not feasible for all applications - in fact my very own minisphere can't use such a solution because Ecmascript code provides the eventloop:

while (system.run()) { ... }

So I would never propose a solution where Duktape provided its own event loop. That's still best handled by the application.

svaarala commented 8 years ago

So, if I wrote an eventloop outside of the application (with the event function registered e.g. into the global stash), how would that be visible through side effects?

fatcerberus commented 8 years ago

You'd have to clarify what you mean by "outside the application".

svaarala commented 8 years ago

C code would call an Ecmascript function registered into the global stash, and that would call a Promise callback. But the callback would not have access to the function (except via Duktape.act() or other non-standard mechanism).

fatcerberus commented 8 years ago

Ah, okay, so like system.run() in minisphere. The simplest case I'm thinking of is where a Promise callback modifies a global variable. If the callstack is not empty you've just changed a running application's state in a way that will be surprising to the application and which it has no general way to defend against.

svaarala commented 8 years ago

So how would the application detect the wrapping call in this case - in concrete terms?

fatcerberus commented 8 years ago

It can't - it can only see the side effects (I did say "indirectly" :). I guess my point is that, since Ecmascript says "empty callstack first", compliant code has no reason to expect such side effects to be possible in the first place.

svaarala commented 8 years ago

But I still don't understand what you mean by an "indirect" side effect here? What would such a thing be in concrete terms? What code could fail?

fatcerberus commented 8 years ago

On the other hand, if there is such a call as system.run() in the API, then application developers should expect that calling it may cause a Promise handler to run.

svaarala commented 8 years ago

So what I'm trying to get at is that specifications in general specify external behavior. Anything consistent with that external behavior is generally acceptable (security features like memory wiping etc being the exception -- they matter, but can't be verified from inside the application). If it's literally impossible to write standard code to tell the difference, it's pretty meaningless to dictate some implementation non-compliant.

But let's take another example: suppose Duktape safe calls were changed so that they involve a call stack entry visible to stack traces and Duktape.act().

Would this mean that it would be then impossible to wrap an event loop inside a safe call because it called promise callbacks from within that wrapper, so the callstack was not empty?

fatcerberus commented 8 years ago

Concrete case off the top of my head: Function creates Promise -> returns -> program does various unrelated things -> calls into C which decides to resolve Promises -> Promise modifies global/closure variable -> Misbehavior ensues.

svaarala commented 8 years ago

But that's a totally different case than what I'm talking about? I'll try to clarify:

Consider first a case where promises are resolved from the top level. For example, the application calls Duktape C API function:

duk_resolve_promises(ctx);

I would then do the following:

Push a C function (duk_push_c_function()) to the value stack, providing a way to call duk_resolve_promises() from Ecmascript code. This value is just on the value stack, so in no way visible to any user code.
Eval an Ecmascript function which resolves promises by calling that binding. This function is also on the value stack, so not accessible to any user code.
Call the Ecmascript function, giving the Duktape/C function pushed on step 1 as argument.

Now, the script code would be something like:

(function (resolvePromises) {
    resolvePromises();
})

So what would go wrong here?

fatcerberus commented 8 years ago

As long as no application calls are in progress when all that happens, then I would almost certainly deem that acceptable, even if it borders on being a Rube Goldberg machine ;-)

I didn't mean to imply a strictly literal reading of the spec if that's what you thought, I'm a little too much of a pragmatist for that :)

svaarala commented 8 years ago

Note that I agree completely it's generally unsafe to resolve promises from wherever and if you do that things will fail in various ways.

I'm just saying that the requirement for the call stack makes no practical sense, if all code leading up to the promise callback call is outside the application code, i.e. hidden from view from any standard mechanism. It would even be possible to flag "infrastructure functions" as such, and hide them even from non-standard mechanisms like Duktape.act() and stacktraces.

svaarala commented 8 years ago

As long as no application calls are in progress when all that happens, then I would almost certainly deem that acceptable, even if it borders on being a Rube Goldberg machine ;-)

How so? It's actually quite common for an engine to leverage the language it implements to implement some mechanisms. For example, a lot of V8 is implemented in Javascript AFAIK.

fatcerberus commented 8 years ago

By "application" I meant the program which created the Promise object. If resolution happens in JS at a level above that, that's fine too.

fatcerberus commented 8 years ago

It seems we were actually on the same page all along and just didn't know it :)

svaarala commented 8 years ago

I meant what you mean by "Rube Goldberg" machine? :)

svaarala commented 8 years ago

As a practical example (and one motivation why I'm trying to explain my view re: the call stack being empty): it would be hypothetically an OK first step to take some ES6 compatible Promise library, written in Ecmascript, and use that to implement promises.

I'm not saying this should be done (it probably wouldn't be a good solution longer term) but in that case the call site for promise resolution would necessarily be in Ecmascript code, i.e. with a non-empty callstack.

I don't see why that would be a contrived setup - but a rather natural one.

fatcerberus commented 8 years ago

Oh, by that I meant the contrived example itself exactly as described. If the Ecmascript function did something more substantial than just immediately calling the C function you passed it and returning I wouldn't call it such. No offense was meant by it. :)

svaarala commented 8 years ago

No offense taken, I was just genuinely confused why you'd consider the setup contrived, because it appears (in essentially that form but with more actual meat) if an external Promises library were to be integrated into a native event loop.

On the topic of infrastructure functions: one future work item is to consider adding an "infrastructure function" flag (or some other means of detecting such functions) so that they could be ignored when assigning error .fileName/.lineNumber "blame".

That feature would also be useful in sandboxing: in sandboxing it would be highly preferable if any infrastructure functions in the callstack were actually invisible, even via non-standard means like stacktraces. Some users see this is a hygiene thing for sandboxes.

So if something like this gets implemented, it would then allow user code to implement a script eventloop which would be invisible to application code as far as anything observable is concerned. But it would still violate a strict requirement of the call stack being truly empty - but I would still leave that decision up to the application.

fatcerberus commented 8 years ago

But yeah, I understand what you're saying: The infrastructure itself may be written in JS in which case an empty callstack is impossible. I have no problem with that.

svaarala commented 8 years ago

Coming back to concrete issues - were you thinking that a hypothetical C API would need to throw if the callstack wasn't empty?

I think it'd be good if the API could throw if there was "user code" in the call stack. But there's not enough information to make that decision now.

svaarala commented 8 years ago

By C API I mean a Duktape C API call that would be used to process the pending events in the job queue.

fatcerberus commented 8 years ago

That throwing behavior would make it impossible for me to use Duktape Promises in minisphere, so I'd prefer if it were optional if anything.

svaarala commented 8 years ago

I agree - ideally it would throw when there's actual application code on the call stack (which means all bets are off re: Promise behavior) but until that distinction is possible it would just be allowed in any state. There'd need to be a warning in the API document and a good explanation in a Promises How-To.

svaarala commented 8 years ago

Here's an example API that might be workable from C code:

/* Enable job queue for a global environment.  Before this is called, any attempt
 * to register a job would fail, including Promise calls.  By calling this, the
 * application indicates it will be processing jobs.
 */
void duk_enable_jobs(duk_context *ctx);

/* Register a job.  Treated essentially like duk_call_method() from argument point
 * of view: value stack contains a function, a this binding, and arguments.
 * Internally the function, the this binding, and the arguments would need to be
 * packed into a job entry, perhaps simply a dense array with 2 + nargs elements.
 */
void duk_register_job(duk_context *ctx, duk_idx_t nargs);

/* Process one entry from the job queue of a global environment.  If the queue
 * is empty, return false; otherwise return true.
 */
duk_bool_t duk_process_job(duk_context *ctx);

svaarala commented 8 years ago

Another design for duk_process_job() would make it an automatically protected call and it would then return:

A success/error indication
A potential error on the value stack
A boolean indicating whether more jobs are queued

For example:

duk_int_t duk_process_job(duk_context *ctx, duk_bool_t *out_more_jobs);

Or maybe the return value could include the "no more jobs", e.g.:

/* Returns:
 *   DUK_EXEC_SUCCESS: job executed successfully
 *   DUK_EXEC_ERROR: job failed
 *   DUK_EXEC_NO_JOBS: no more jobs
 */
duk_int_t duk_process_job(duk_context *ctx);

Allowing callback errors to be handled by the application is quite important - because Duktape has no standard console or logging, this needs to be done by the application and the API should encourage to do so rather than ignore errors.

svaarala commented 8 years ago

Regarding implementation it'd probably be good to add generic support for a job queue into the C API first, and then implement Promises on top of that. As discussed above, it'd be really nice if the mechanism was generic and also available for user code.

fatcerberus commented 8 years ago

Regarding implementation it'd probably be good to add generic support for a job queue into the C API first, and then implement Promises on top of that.

:+1: to that. ES6 describes ScriptJobs which are essentially just "Queue this code to run" so exposing it as a generic primitive makes sense.

svaarala commented 8 years ago

Right, user jobs would be just another type. I'm not sure if it makes sense to try to differentiate the jobs or manage different job queues, at least as a first approximation.

The specification allows that as far as I can tell: it says jobs don't need to execute in order but jobs from a certain queue (like PromiseJobs) must execute in order. The simplest implementation is just a single queue, executed in order, with no tracking of job types.

I'd need to re-read the ES6 sections but maybe the queue could initially just be a simple array with each entry an array containing a function, a this binding, and arguments:

[
    [ cbFunc, someThisBinding, 1, 2, 3 ],
    [ anotherCbFunc, null, 4, 5, 6 ],
    ...
]

The simplest implementation will push new jobs on the queue, and then "shift" the queue (remove first element) for execution. Shifting is (relatively) expensive for large arrays but it will just be a memmove() in practice (the job queue can be kept a dense array at all times) so it won't really matter unless you have hundreds or thousands of jobs queued. (It can then be improved later by maintaining a "next" index and only really unshifting the queue occasionally.)

Dense arrays are nice in that they are duk_tval arrays with much smaller overhead than representing the jobs as objects (which would be more natural of course).

fatcerberus commented 8 years ago

It can then be improved later by maintaining a "next" index and only really unshifting the queue occasionally.

Another option would be a circular queue with a "next" index. The only cost would be when the array needs to be resized, queuing and dequeuing would be O(1). Since jobs are executed sequentially anyway, I imagine a circular queue would work quite well.

svaarala commented 8 years ago

That would work well too, and can also be resized in-place. The minimal unshift approach doesn't need any index state or such, so you just operate on the array. Anything fancier requires one or more control properties. These can added to the queue array object.

svaarala commented 8 years ago

Or we can get crafty and put control properties at the beginning of the array object (e.g. at index 0).

svaarala commented 8 years ago

Actually if one wanted to really minimize the number of objects allocated, a single array would do it: one would append the jobs of the form: nargs, function, this, arg1, ..., argN to a single flat array.

When executing a job the "nargs" fields allows you to remove the job or to increment a circular buffer index (but walking backwards from the end doesn't work).

But this may be taking it too far at least as an initial implementation: it'd be best to use something trivial at first to get some experience if it works and actually fits a purpose.

svaarala commented 8 years ago

Hmm, somewhat related, dense arrays are relatively memory efficient to represent tagged value slices. The cost is a duk_harray header but after that there's no additional overhead because a dense array part is just a sequence of plain duk_tvals.

Some internals where you want to store a slice of tagged values would benefit from an internal type which would have a very minimal heap header (same as a string) and just contain a sequence of duk_tvals. In essence if would be a lightweight, property free array.

In this case that datatype would be useful, and it might also be useful for dealing with array slices in user code. For example, instead of packing an array slice into an actual array instance temporarily (hypothetical duk_pack() API) you could pack into a lightweight tagged value slice and then unpack from there.

This is something I've toyed around a bit earlier on because it would play nice with adding more slice / array operations into the API. I can prototype that separately, and if it goes into master, Promises can then also use it as some point. But let's not assume this would be done first :-)

fatcerberus commented 8 years ago

I found this article that explains pretty well why Promises are never settled synchronously: http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony

Quite an interesting read. "Releasing Zalgo" indeed. :)

svaarala commented 8 years ago

I don't really see that as a black-and-white issue either way. It's about trade-offs between performance, scheduler flexibility, and error proneness.

It's quite possible to design an API where the callbacks are sometimes inline or not (it might be based on e.g. how long the scheduler has been running without yielding, for example). Such an API will quite obviously perform better than the alternative of queueing every single callback through the top level.

But unless calling code has been very carefully written and tested, the pattern creates a lot of semi-cold code which can be quite trivially broken. That's quite fine if the author of the calling code accepts this fact and deals with it accordingly. But since many users don't really want to take the time and think things through carefully, it's just an error prone design - especially for scripting where the calling code is often "glue code" which is written more hastily than e.g. typical C code and one generally prefers to avoid cold paths even when the trade-off is lower performance.

That's at least the case for the default behavior. I wouldn't actually mind an API where you could signal that you're OK with the callbacks happening inline, and you'd then be responsible for ensuring you don't make any incorrect assumptions.

All in all the ES6 Promise behavior is the most practical default behavior IMO.

fatcerberus commented 8 years ago

Regarding mixing inline with async, the article above links to the following article which talks about that too: https://blog.ometer.com/2011/07/24/callbacks-synchronous-and-asynchronous/

The gist is that, if you have scenarios where sometimes you can resolve immediately and sometimes not, it's better to rethink your API design, for example, by adding a polling mechanism, rather than mixing behaviors (which can introduce some element of non-determinism) in the name of performance.

But that's getting a bit off-topic, I think. Indeed, I'd agree that ES6 Promise behavior is the best default.

svaarala commented 8 years ago

Right, non-determinism is a "cost" you need to pay if you want to optimize performance and give the scheduler the power to decide how many inline calls are made in each situation (it is useful e.g. for semi real time applications which are also performance sensitivite). Like I said, it's a trade-off, and a vast majority of users seem to prefer deterministic behavior over the potential performance improvement of the alternative (non-deterministic) behavior.

I like that trade-off myself too. But I don't see it as a right/wrong question, in some situations a different trade-off may make sense.

fatcerberus commented 8 years ago

Back on topic: Having quickly revisited my Promises branch, I agree that it'd be better to implement that generic job queue support first, before moving forward with a Promise implementation.

svaarala commented 7 years ago

Once the 2.1.0 release is complete I'll try to write a first draft of a job queue implementation which could then be used as a basis for starting a Promise implementation. It will most likely need to be rewritten at a later point so I'll try to keep it as straightforward as possible for now (e.g. use existing objects/arrays for tracking etc).

svaarala commented 7 years ago

Regarding the C API, not sure if it was covered earlier but: the duk_debugger_cooperate() is typically called from an event loop with the call stack empty, so it'd actually be using the same pattern as a "process one job" API call. So maybe the debugger cooperate call could be integrated into the same conceptual API somehow.

The practical difference is that there would be no concrete Debugger job entries, but the handling could be something like this; when calling "process one job":

If debug client is attached, poll for debugger messages without blocking. Execute at most one command.
If a command was executed, return true without checking the actual job queue. The next call will check for more debugger commands (or the job queue).
If no command was executed, process the actual job queue, returning true/false to indicate whether we're complete or not. This allows an application to process all pending jobs in one loop if it wishes to do so (or execute them one at a time).

So, debugger commands would behave like a separate Debugger job queue which had priority over all other queues (which is allowed in ES2015; queue selection is up to the engine).

With this approach only a single "process job" API call would be needed in the application event loop.

svaarala commented 7 years ago

The duk_debugger_cooperate() call is actually missing a true/false return value (indication of whether there are potentially more commands). It doesn't matter if one calls the function constantly, like in the frame loop of a game.

However, on low power devices it'd be useful to detect when there are no more pending commands. This allows one to do something like:

Cooperate with debugger.
If a command was executed, sleep 1ms and check again. Or just call cooperate again until it indicates there are no more messages.
If no command was executed (queue is empty), sleep 500ms and check again. This allows the application to sleep most of the time.

In the current API this isn't possible because the command queue status is not exposed (return value is void). So moving to a unified job queue API could also fix this shortcoming.

It's a separate question whether an application should be able to control what conceptual job queues are executed (e.g. debugger only, promises only, etc).

fatcerberus commented 7 years ago

This seems workable. One thing that worries me though is the possibility that under heavy debugger activity the debugger may starve the real job queues and therefore make it more difficult to debug e.g. promises.

svaarala / duktape

Implementation options for ES2015 Promise support #1035