janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.5k stars 225 forks source link

sendfile functionality in the ev/ or net/ module #852

Open bakpakin opened 2 years ago

bakpakin commented 2 years ago

A common operation on sockets is to send a file over them, and the sendfile system call does that without making extra copies into userspace, at least on Linux. Most operating systems have such a system call, and any that didn't could easily fall back to reading the file into memory and sending it with write.

Linux also has a few newer system calls that do this more generally like splice.

zevv commented 1 year ago

I'd like to hijack this issue to start a more generic discussuon about the event queue and the (lack of) separation of functionality. Recently I did some experiments to get TLS working with Janet streams and the event loop, and found this not trivial (see bottom of this message)

The current event loop implementation has a very tight coupling between handing events (timers, watching sockets/handles/fds), and doing the actual system call to perform the actual I/O when the time comes. This obviously has advantages with respect to performance as there is very little or no overhead involved, but the downside is that the event loop code itself has to implement all possible system calls that make sense in combination with readble/writable fds. At this time there is specific code to handle read(), write(), send(), recv(), sendto(), recvfrom(), connect(), in all possible flavors for the various supported architectures. This issue possibly requires this code to be extended even further to support sendfile() and splice();

However, there exist more API's and interfaces that work with events on file descriptors; for example it is a common pattern for networking libraries to offer an interface for the user to register/unregister fds/sockets with any kind of event loop, and later have the library handle the actual IO when ready (e.g. OpenSSL, Curl multi interface, etc). On Linux there exist also various drivers that require a user to wait for an event on a file descriptor, but perform the actual IO using IOCTLs (e.g. v4l2, GPIO, various platform drivers etc). Of course it is not feasible to add each and any of these interfaces to the event queue code.

I think the event loop would benefit from a more "generic" or "pluggable" design: the core event loop would only be responsible for handling events on sockets/fds/handles/timers, and the actual underling I/O operations will be performed outside of this module. Where possible this could all stay in C through a conventional callback mechanism, in addition it could support going back to Janet-land resuming the fiber, and Janet code handling the actual I/O.

This would allow users or 3rd-party-library-writers to implement their own stream types to work with Janets event loop, without pulling each and every possible I/O mechanism into the ev module.

One other advantage of this split-up is that the networking code actually lives in the net module; the event loop does not need to know about all the different systems calls involved for all the platforms.

So, some example scenarios:

1. Fast path, all C:

This provides the basic functionality as Janet currently has, the "common" system calls are supported by the stdlib.

2. Fun path, back through Janet:


(about TLS with OpenSSL: I succeeded by abusing ev/read and ev/write with 0-byte buffers, which makes the event loop go trough all the motions of registering the underlying fd/socket, suspend the fiber until readable/writable, doing a bogus read/write and resuming my fiber where I can finally handle the real system call using SSL_Read() and SSL_write(). With these API calls handling events is even more interesting because an call to SSL_Read() might actually require multiple read()s or write()s on the underlying socket to succeed)

bakpakin commented 1 year ago

So as per usual, the reason this seems harder than it needs to be is windows portability. On windows, the most general and efficient interface for single threaded, concurrent IO is IOCP, which is different than epoll/kqueue on linux/bsd/mac. IOCP works by submitting a request to do an operation that will complete later, so the operation and the waiting is combined into one syscall. This has the advantage of one fewer syscalls per operation and is anecdotally a bit faster, but is less flexible, and now every IO operation takes a bazillion arguments. In other words, having the waiting be done in the event loop and the IO done by what ever library you want doesn't really work well on windows.

I think the event loop would benefit from a more "generic" or "pluggable" design: the core event loop would only be responsible for handling events on sockets/fds/handles/timers, and the actual underling I/O operations will be performed outside of this module. Where possible this could all stay in C through a conventional callback mechanism, in addition it could support going back to Janet-land resuming the fiber, and Janet code handling the actual I/O.

This is more or less how it already works. The JanetListener function typedef can be used to make a state machine that can get one or more events before resuming the fiber. That said, we could just provide a state machine that did nothing but immediately resume the fiber for any events it received - hypothetically:

(import mystuff) # various C utilities

(def connection (net/connect "1.2.3.4" "7788"))

# This could probably be cleaned up a bit
(def s (ev/listener connection :w)) # :w means listen for write events - this could be something like :wr for both read and write events
(def event (ev/next-event s)) # suspend fiber until next event
(if (= event :write)
  (mystuff/sendfile-nowait connection "file.txt")
  (errorf "unexpected event %v" event))
(ev/unlisten s)

One could implement this now as a library for Janet, but I suppose it would be nice as part of ev/. Also I think this could be refined a bit for efficiency and ease of use.

The reason not to do this of course is that the entire state machine can be done in C to avoid the interpreter for complex IO, retrying syscalls, accumulating partial reads into a larger buffer, etc.. Also windows IOCP doesn't have anything like "read" and "write" events, they just have completion events (in the example, we would replace a :write event with :done and move the operation before the call to ev/next-event).

One other advantage of this split-up is that the networking code actually lives in the net module; the event loop does not need to know about all the different systems calls involved for all the platforms.

The ev.c file contains a lot of code simply to keep stuff all in one place, and the reason the networking code lives is intertwined with the general purpose socket code is simply because networking with berkley sockets is almost identical to using any other file descriptors, and the goal was to avoid duplicating the logic for read/write for send/recv. It perhaps isn't the cleanest but lets us reuse a lot of the generic reading a writing of buffers.

Lastly, switching to a poll based interface precludes the latest-and-greatest of fast, IO based apis - io-uring on linux and the windows answer: https://learn.microsoft.com/en-us/windows/win32/api/ioringapi/. Not that important but the current event loop interface could map to both of these.

zevv commented 1 year ago

So as per usual, the reason this seems harder than it needs to be is windows portability.

I suspected that much; I must admit I am blissfully unaware of most of the windows APIs and often wrongfully assume most of the world does things the POSIX way.

That said, we could just provide a state machine that did nothing but immediately resume the fiber for any events it received - hypothetically:

I considered this, as it basically offers the same functionality I now get when doing the 0-byte transfers with read and write; that would allow all the funny other I/O methods to play well with the event loop at the price of going back into Janet before doing the I/O - which should be no problem for most of the things I have in mind. I might come up with a PR for this.

The ev.c file contains a lot of code simply to keep stuff all in one place, and the reason the networking code lives is intertwined with the general purpose socket code is simply because networking with berkley sockets is almost identical to using any other file descriptors, and the goal was to avoid duplicating the logic for read/write for send/recv. It perhaps isn't the cleanest but lets us reuse a lot of the generic reading a writing of buffers.

All good reasons, I agree.