Closed aarzilli closed 3 years ago
I implemented something very similar to this (using a per-thread boolean variable, not a Context) in an unpublished branch in which I developed a gdb-like source-level debugger for Starlark; the interpreter would load the variable on back edges and function calls, as that's all that's necessary to ensure eventual interruption. Upon finding the variable to contain true, the interpreter would enter the debugger. I suspect a single mechanism could be made to serve both needs, if the interpreter calls a hook function whose default behavior is to return an 'interrupted' EvalError, and the debugger overrides this hook.
Using an ordinary boolean variable avoids the need for volatile or atomic loads in the main interpreter loop, at the cost of making it the responsibility of the client to interrupt each thread independently, not just once on a shared (Context) variable. A debugger, for one, needs the ability to interrupt a specific thread, ideally without interrupting and resuming all the others.
But perhaps that's the wrong choice and the interpreter should observe a shared variable, using atomic operations. If we were to take that approach, a 'done' channel might be more appropriate than a Context. The worst thing about Contexts is that they are just a bag of stuff: they have no particular scope and you can stuff anything into them. (This is also the best thing about Contexts.) Specific clients can already tunnel a Context---or more than one if they so choose---through the interpreter using the Thread 'locals' mechanism. I would rather not make Contexts a core part of the interpreter semantics.
If the execution of one goroutine is inside a long-running Value.Call, how would another goroutine set a non-atomic boolean in the Thread that that goroutine is using to run that Value.Call? Or maybe you meant something else with your lower-cased "thread".
Something needs to cause a happens-before from the party wishing to interrupt to the party that's deep in the bowels of starlark.
If the only answer is "wait until the interpreter calls a function you defined in Go (in that goroutine)", then the boolean doesn't seem to be adding anything extra; that Go function can just select { case <-ctx.Done(): return ctx.Err(); default: }
right there.
Something needs to cause a happens-before
Yes, that's the obvious right way to do it. Re-reading my comment I suspect I was assuming that continued Starlark execution would eventually observe the flag change because of some other synchronization (e.g. malloc) and that "eventually" would be soon enough. But an atomic read is probably so cheap there's no reason to cut corners.
In favor of this actually being a context.Context. It might be a vague but it also ended up inside the standard library, and it can be passed through if one needs to make a network call, or a database query. Any other solution will require writing an adapter.
There's an active PR to add cancellation: https://github.com/google/starlark-go/blob/e94f8a2396100c55971a981ce0237cee284dae39/starlark/eval.go#L74
Because the interpreter loop needs to poll the cancellation state, it intentionally exposes a Cancel method that sets an atomic boolean, rather than using a done channel, because measurements showed that it was significantly (5x) faster. Clients can of course adapt one to the other like so:
ctx, cancel := context.WithCancel(origCtx)
go func() {
<-ctx.Done()
thread.Cancel()
}()
...execute Starlark code...
cancel()
If you need to pass a context through the interpreter to your application, use thread.SetLocal (https://github.com/google/starlark-go/blob/master/starlark/eval.go#L59). See the REPL for an example. The interpreter has no need to know about it.
it intentionally exposes a Cancel method that sets an atomic boolean, rather than using a done channel, because measurements showed that it was significantly (5x) faster
if thread.Steps % 5 == 0 {
select {
case <-thread.Context.Done():
err = fmt.Errorf("Starlark computation cancelled")
break loop
default:
}
}
Now it's the same performance.
Is it though? It adds a highly unpredictable branch to the core loop, and potentially increases the latency of cancellation.
I don't know, but I bet you can find a constant large enough. There is no guarantee over the latency of the cancellation, I don't think you have defined what a step is.
Cancellation is now supported, so I'm going to close this request.
~@alandonovan, apologies for commenting on a closed PR. Is there a way to know when a thread has been cancelled? I'm trying to find how to do some clean up from my built-in functions.~ nvm, I'm going to do it from outside the interpreter.
~@alandonovan, apologies for commenting on a closed PR.
No worries!
Is there a way to know when a thread has been cancelled?
No, but I would accept a PR to add a Thread.Canceled method with this implementation.
No, but I would accept a PR to add a Thread.Canceled method with this implementation.
ahh, that doesn't look too bad. I can give that a go. Thank you Alan!
The repl program uses a "context" local variable suggesting that long-running operations should check this context and terminate early when the context expires. This solution will probably take care of 99% of use cases but won't stop the interpreter if the long-running operation is implemented in starlark itself.
I propose that a Context field is added to starlark.Thread and that the interpreter loop checks it every N-th instruction and returns an appropriate error when starlark.Thread.Context expires.