luvit / luv

Bare libuv bindings for lua
Apache License 2.0
832 stars 187 forks source link

Support passing tables and closures to new threads #708

Open cshuaimin opened 4 months ago

cshuaimin commented 4 months ago

Hi luvit team, first thank you for this great project! Luv makes it possible for calling libuv functions in Neovim!

In this PR I changed thread_arg_* functions to support table and function types to new_thread(), queue_work() and async_send() functions.

Althrough in some feature requesting issues you said that not supporting tables is intentional (I can't find the exact reply anymore), I believe it's a very important feature to have. It will unlock possibilities like multi-threaded Neovim plugins, off-loading some CPU intensive work to a thread pool. I sincerely invite you to re-think this feature.

I mainly translated Neovim's deepcopy() function to C. It uses a cache table to properly support reused table fields.

The function type supports upvalues. Supporting functions is needed because it allows Lua OOP values to be passed to new threads.

Testing: I added test and passed valgrind, and I've tested this branch in Neovim. However not tested in Lua versions other than LuaJIT.

SinisterRectus commented 4 months ago

I can't find the exact reply anymore

Some context:

Adding table and function serializers to luv is intriguing, but they do add complexity that has been considered undesirable.

I'm not proficient enough in C to review this code. I will opine that these serializers should not be added based on the prior discussions, but I an open to being convinced otherwise.

My opinion is that, because coroutines exist, luv threads should be used sparingly in cases where states do not need to be shared across them.

Overhead and complexity aside, I'm also concerned about this adding pass-by-copy behavior for users who may not expect it.

Could you maybe show an example of using this in neovim? I'm curious to see what code would benefit from threading where copying states is important.

cshuaimin commented 4 months ago

I totally understand your concerns. I'm not very sure whether this is the best solution too. However in order to use the full power of CPU cores, in my opinion this complexity is unavoidable.

I have a nvim plugin that searches a file with tree-sitter. It can only search one file. To support searching multiple files under a directory, the plugin needs to parse hundreds of files with tree-sitter, which may block nvim UI for several minutes. If I can pass tables and functions (actually the vim.treesitter.LanguageTree object) to Luv threads, then I can move parsing off main thread. Lua coroutine does not help here, because it's pure CPU work.

Aside from my own plugin, nvim's tree-sitter highlighting also runs on main thread. Opening or eding a large file with lots of tree-sitter injections was quite slow. Popular nvim distros disable tree-sitter highlights in large files. Nvim devs has already made this faster. With this PR it may be possible to make nvim UI even more responsive.

cshuaimin commented 4 months ago

Here is the relevant code in my plugin if you are curious. It's still experimental code.

truemedian commented 4 months ago

in my opinion this complexity is unavoidable.

This complexity is certainly avoidable in Luv. The solution you've implemented here is nothing more than a serialization and de-serialization of both tables and functions types.

Nothing is stopping you from doing this serialization outside of luv, potentially even serializing the respective table to a string entirely within Lua.

In my opinion this change is unnecessary and adds even more confusion around how luv threads work, because tables and functions are passed through the thread boundary by copy instead of by reference, which is the opposite of how the types behave normally.

From a user-facing perspective, this invisible copy of the types introduces the wrong behavior (eg. changing a table inside the thread does not change the table outside the thread; functions with upvalues behave strangely, because all upvalues are copies not references to their original values).

The most important problem I'd like to bring up is that this introduces incorrect and invalid behavior when serializing any function associated with a C library; most libraries are not thread-safe, and without a significant amount of added effort userdata will not get the correct metatable because the library was never required in the new Lua state and the metatables will not be present in the registry.

cshuaimin commented 4 months ago

because tables and functions are passed through the thread boundary by copy instead of by reference, which is the opposite of how the types behave normally

The copy behavior isn’t that novel, the web worker API postMessage() has exactly the same deep copy semantics. We can tell users clearly the behavior in the documentation.

most libraries are not thread-safe

I had planned to fix this in a follow up PR if you are OK with this one. The idea is that to mak userdata thread safe, Luv users should 1) make sure the functions use locks properly when accessing the userdata, 2) use some reference counting mechanism to close the resource once (shared ownership). For 2) we can call a __luv_thread_copy() like method on userdata when deep copying. In the method users increase the reference counter. For example in my user case, to transfer a tree-sitter userdata I will call ts_tree_copy() in this method.

serializing the respective table to a string entirely within Lua

In Lua it’s impossible to handle userdata type, and too complex for plug-in authors. We had the API to create threads for years, but there was’t a single multi-threaded plugin for nvim.

the metatables will not be present in the registry

At least Neovim registers metatables in every Lua state, via acquire_vm_cb().

Bilal2453 commented 3 months ago

I am not in favor of adding this into Luv, I also believe it is an avoidable complexity that is better off being handled in Lua. But also admittedly that won't be as straight forward, and as mentioned handling userdata on the Lua side is going to be tricky to say the least.

But also threading in Luv (and Lua generally) is really lacking, and just barely works, personally I think due to the design of Lua it is really unfavorable to do any extensive threading besides crushing some numbers/strings, so any work on that side is appreciated.

How about adding support for this but outside of new_thread(), queue_work(), async_send(). For example adding new new_thread_copy, queue_work_copy, async_send_copy. This will hopefully make it very clear what it is doing with the data to not confuse the user, and at the same time it isolates the unwanted complexity into its own experimental space. Will also ensure that we don't accidentally break already working code.