google / starlark-go

Starlark in Go: the Starlark configuration language, implemented in Go
BSD 3-Clause "New" or "Revised" License
2.26k stars 204 forks source link

Is the Starlark thread thread-safe for concurrent `Exec*()`? #540

Closed StarHime closed 2 months ago

StarHime commented 2 months ago

I'm thinking about an extension for Starlark that requires asynchronous function calls, and I'm wondering if the Starlark thread is thread-safe, and if multiple concurrent Exec*() calls are possible.

Here's the prototype I'm considering:

func CallFuncAsync(thread *starlark.Thread, b *starlark.Builtin, args starlark.Tuple, kwargs []starlark.Tuple) (starlark.Value, error) {
    // ...

    token := Wrapper()...
    go func() {
        fn := args[0].(*starlark.Function)
        fn.CallInternal(thread, starlark.Tuple{...}, nil)
        close(token.Chan)
    }()
    return token, nil
}

This function would use a goroutine to call the Starlark function in the Go runtime with the same thread, and return a token for waiting.

Is this a safe approach? If not, are there alternative ways to achieve asynchronous function calls in Starlark?

adonovan commented 2 months ago

No. If you need multiple threads, create multiple Threads.

StarHime commented 2 months ago

@adonovan Thanks for your answer!

If I don't create a new Thread, it may panic for data race or something. But new Thread doesn't copy the call stack, which makes the CallInternal fails to find the func to call.

Here's the Go code snippet:

func asyncCallFunc(thread *starlark.Thread, b *starlark.Builtin, args starlark.Tuple, kwargs []starlark.Tuple) (starlark.Value, error) {
    if l := len(args); l < 1 {
        return nil, fmt.Errorf("%s: takes at least one argument (%d given)", b.Name(), l)
    }

    // first argument must be a callable
    fnc, ok := args[0].(starlark.Callable)
    if !ok {
        return nil, fmt.Errorf("%s: first argument must be callable", b.Name())
    }

    // call the function asynchronously
    nt := &starlark.Thread{Name: "async", Load: thread.Load, Print: thread.Print, OnMaxSteps: thread.OnMaxSteps}
    go func(thread *starlark.Thread) {
        // call the function
        val, err := fnc.CallInternal(thread, args[1:], kwargs)
        fmt.Println("[done]", val, err) // HACK for debugging
    }(nt)
    return starlark.None, nil
}

Starlark code to test:

def work(y):
    print('work', y)
    if y == 40:
        fail("oops")
run(work, 20)
run(work, 30)
run(work, 40)
run(work, 50)

And the errors in Go:

panic: runtime error: slice bounds out of range [:-1]

goroutine 9 [running]:
go.starlark.net/starlark.(*Function).CallInternal(0x1400015ec40?, 0x14000128e70?, {0x1400011c970?, 0x1?, 0x1?}, {0x0?, 0x0?, 0x0?})
    /Users/vej/go/pkg/mod/go.starlark.net@v0.0.0-20240123142251-f86470692795/starlark/interp.go:29 +0x4014

So is there any way to clone the Stack or avoid this panic? Or async call is actually can't be done without internal changes on Starlark itself?

adonovan commented 2 months ago

If I don't create a new Thread, it may panic for data race or something.

(Or it may not panic but silently corrupt your data.)

But new Thread doesn't copy the call stack, which makes the CallInternal fails to find the func to call.

Here's the Go code snippet:

    // call the function asynchronously
    nt := &starlark.Thread{Name: "async", Load: thread.Load, Print: thread.Print, OnMaxSteps: thread.OnMaxSteps}
    go func(thread *starlark.Thread) {
        // call the function
        val, err := fnc.CallInternal(thread, args[1:], kwargs)
        fmt.Println("[done]", val, err)
    }(nt)

The surface problem causing the crash is that you are calling CallInternal. You should never call it directly; use starlark.Call.

But there is a deeper problem, which is that you are passing unfrozen values from one thread to another. Either of the threads could mutate the value, causing another data race. You could call Freeze on all the args and kwargs passed to asyncCallFunc, but this side effect could be very subtle and would likely be a surprise for the Starlark calling code.

In short, you are trying to do something that is not supported.

Starlark programs can be highly parallel, but the concurrency needs to be coordinated at a higher level by the host application, not by the Starlark code itself.

For example, in Bazel, each file is initialized (executed from top to bottom). If it's a .bzl file the result is a module containing a bunch of functions, which are then frozen, making them safe for concurrent execution. If it's a BUILD file, the effect is to construct part of the build dependency graph (an application data structure), which may contain some of those functions declared in the bzl file. Initialization of BUILD and bzl files, and application call-backs to bzl functions, can all be highly parallel, but there is no user-visible concurrency in any given Starlark file.