oven-sh / bun

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
https://bun.sh
Other
73.93k stars 2.75k forks source link

Make IPC fast #8641

Open Jarred-Sumner opened 8 months ago

Jarred-Sumner commented 8 months ago

We're probably not going to work on this very soon, but I wanted to write some thoughts out while they're in my head

IPC is currently implemented using unix domain sockets, and mostly relying on structuredClone() to serialize & then later deserialize.

This works great in many scenarios, but it's certainly not optimized. Every message is cloned into a temporary buffer, and then that temporary buffer is immediately written to the socket (which is cloned). Reading the message also clones it again. This is too many copies.

I think it'd make sense to specialize on a couple kinds of messages:

  1. 1 ArrayBuffer: proc.send(new ArrayBuffer(42))
  2. 1 ArrayBufferView: proc.send(new Buffer(42)) supporting various typed array types + Buffer
  3. 1 latin1 string: proc.send(JSON.stringify("abc"))
  4. 1 UTF-16 string: proc.send(JSON.stringify("abc❤️"))
  5. huge(any of 1 - 4)

There are a number of options we can do to make IPC fast.

Linux: memfd_create

memfd supports sealing which lets us make read-only in-memory file descriptors. We can have a reader end and a writer end this way. eventfd can be used for signaling readiness across processes. memfd is good here because we can skip the read() and write() system calls + copying in/out of kernel space.

What we can do here, not huge edition: 1) Make the memfd expand maximum up to 2 MB or so 2) mmap() the memfd and copy strings and other data directly into it 3) Write to the eventfd to signal the other process to read it 3) In the other process, we keep a MAP_SHARED mapping of the memfd 4) Clone the string, arraybuffer, etc and call it a day.

Before: cloning to the StructuedSerialized format, then cloning to the unix domain socket, then reading from the unix domain socket, then decoding from the unix domain socket, and cloning one last time again.

After: One clone to the memfd, and one clone in the other process to the WTF::String or JSC::ArrayBuffer.

What we can do here, huge edition: 1) Given a large ArrayBuffer, string, etc create a fresh new memfd 2) Seal it 3) Send the memfd via sendmsg 4) Write to the eventfd to signal the other process to read it 5) [other process] Receive the memfd via recvmsg 6) mmap() the memfd into a MAP_PRIVATE copy 7) Use WTF::ExternalStringImpl, or WTF::ArrayBuffer::* method to unmap and close the file descriptor once the string or arraybuffer is finalized

Before: cloning to the StructuedSerialized format, then cloning to the unix domain socket, then reading from the unix domain socket, then decoding from the unix domain socket, and cloning one last time again.

After: One clone to the memfd

The tradeoff here is that the size of the content has to be large enough to justify the cost of the unique memory mapping per message as well as the cost of keeping a file descriptor open for potentially a long time. That's why you probably only want to do the 1 clone approach for very large messages.

Darwin: mach_vm_copy and machports

TODO: expand on this

Araxeus commented 3 months ago

Would be nice if you could make true IPC available, so that we can share data with bun process's that aren't children see also #11683

also kinda unrelated to this issue but node-ipc doesn't even work on bun - #12712

EDIT: I found an alternative to node-ipc called zeromq but sadly it currently doesnt work on bun either, see ~#12711~ #12746

TheBongStack commented 1 month ago

Is this feature becoming stable any time soon? I know bun offers mmap but this dedicated IPC module feels better

anthonykrivonos commented 3 weeks ago

This would honestly be huge for realtime processing applications, @Jarred-Sumner.

We spent a few days trying to find a memory leak in our code when, in the end, it was because we were sending thousands of buffers to a child process (our fault for not looking at the underlying IPC implementation or finding this issue first).

As a temp solution, we're using semaphores from the async-mutex to avoid overloading the domain socket.