google / starlark-go

Starlark in Go: the Starlark configuration language, implemented in Go
BSD 3-Clause "New" or "Revised" License
2.26k stars 204 forks source link

Thread-Safety for Built-in Types Such as Dict, List, Set #503

Closed WOSDOA closed 10 months ago

WOSDOA commented 10 months ago

I have been utilizing the Starlark-Go for a while and have been impressed with its performance and capabilities. Recently, I've run into a situation in my project in which I have to manage multiple threads working with the same Starlark type instance. Specifically, these cases largely involve built-in types like dict, list, and sets.

In the documentation and previous issues, I couldn't find any explicit mention or demonstration on the thread-safety of these built-in types.

Understanding the concurrent access behavior is crucial for my project. Can you clarify if these types are thread-safe when read/write operations are performed on them simultaneously in both Go and Starlark?

Is it safe, for instance, to write to a Starlark dict in one Go routine while another Go routine reads from it? Conversely, how about if we perform simultaneous reading/writing to the same Starlark type instance from different Starlark threads?

Also, if thread safety is not inherently provided, could you possibly guide me on the best practices to ensure that my multi-threaded operations are safe and correct without potential of data races?

For the same Starlark type instance, is it thread-safe to read/write to them in Go or Starlark at the same time?

For example, if I have a dict in Starlark and I want to read from it in Go, is it safe to do so if another Starlark thread is writing to it at the same time?

Similarly, if I have a list in Starlark and I want to write to it in Go, is it safe to do so if another Starlark thread is reading from it at the same time?

I would appreciate it if you could clarify this for me. Thank you!

adonovan commented 10 months ago

Mutable ("unfrozen") Starlark data structures are absolutely not thread-safe, and if your application is executing two Starlark threads that have access to the same mutable data, then "you're holding it wrong." There are no mutexes and we do not plan to add them. (An application could define a mutex type, but I really wouldn't recommend this.)

The solution is to use the "freeze" feature, which lets you convert a set of data structures produced by a single thread into an immutable state, after which they can be safely read by any number of threads, but cannot be mutated by any thread (it results in a dynamic error).

Consider Starlark modules A and B that depend on module C. First C is loaded and initialized, then all its module data is frozen. What A and B see when they import C is its frozen module data, which they can use however they like without fear of races, even when A and B run in parallel. Of course, they cannot mutate C's data. Some of the values exported by C may be functions (or even function closures): A and B may call those functions, and those functions may do computation involving mutation of local variables, but they cannot mutate C, or even the "free variables" of the closure, since C has become frozen.

Bazel uses this extensively: for example, a low-level module might register with Bazel a function that tell it how to construct command lines for (say) a Kotlin compiler; this function is then frozen. Later, thousands of modules that need Kotlin builds will cause Bazel to invoke this function repeatedly, with very high levels of parallelism. There are no locks, yet there are no races.

See https://github.com/google/starlark-go/blob/master/doc/impl.md#freezing You might also find https://www.youtube.com/watch?v=9P_YKVhncWI helpful (esp. at 15m and 22m in)