Closed amano-kenji closed 7 months ago
Here is a bigger question: how much OS function support should Janet have when compiled without event loop?
I don't know.
This might be something better with spork. Linux, and I suspect other operating systems, make reading and writing from the file essential synchronous. A re-implementation of slurp that uses os/open
might not be that useful
is current ev/read
synchronous? it says otherwise in (doc ev/read)
cfunction
src/core/ev.c on line 2985, column 1
(ev/read stream n &opt buffer timeout)
Read up to n bytes into a buffer asynchronously from a stream. n
can also be the keyword :all to read into the buffer until end of
...
You mean ev/read
and ev/write
don't yield to the event loop when they act on files opened with os/open
?
You mean
ev/read
andev/write
don't yield to the event loop when they act on files opened withos/open
?
Sort of. I was wonder because @bakpakin said
make reading and writing from the file essential synchronous.
Even if you use poll or select on files in Linux, reading and writing can still block. Yes, for special files and pipes this might fix a problem but I think it makes more sense to just move to spork.
What do you mean? Do you mean that ev/read
and ev/write
against a file can block the event loop?
I don't know what to add to spork because I don't have grasp of the problem.
What I was trying to say is that the implementation is straight forward, but that it is not all that valuable. For more information, here is a mailing list that in short describes the issue: https://utcc.utoronto.ca/~cks/space/blog/linux/HardAsyncFileIO
If you show some motivating code where blocking the event loop was an issue, maybe we could offer more help.
Even if you use poll or select on files in Linux, reading and writing can still block. Yes, for special files and pipes this might fix a problem but I think it makes more sense to just move to spork.
io_uring would not block for data read on Linux;
we should use something like libuv (but smaller) for sanity if we want to add that to Janet, although current O_NONBLOCK
is fine i think? personally, i don't need that much performance from Janet.
An io_uring event loop would be nice and probably work, but again for such a high performance requirement, the other abstractions in Janet would likely be unsuitable anyway. Not to mention a requirement on more modern kernel versions.
If you raise minimum required kernel version, you can implement asynchronous file IO now?
It would look like io_uring
is not the panacea necessary for AIO on files per the man page sourced here.
aio(7) - See notes section.
In case things change, here are some bits from the linked page:
Simultaneous asynchronous read or write operations using the same
aiocb structure yield undefined results.
The current Linux POSIX AIO implementation is provided in user
space by glibc. This has a number of limitations, most notably
that maintaining multiple threads to perform I/O operations is
expensive and scales poorly. Work has been in progress for some
time on a kernel state-machine-based implementation of
asynchronous I/O (see [io_submit(2)](https://man7.org/linux/man-pages/man2/io_submit.2.html), [io_setup(2)](https://man7.org/linux/man-pages/man2/io_setup.2.html), [io_cancel(2)](https://man7.org/linux/man-pages/man2/io_cancel.2.html),
[io_destroy(2)](https://man7.org/linux/man-pages/man2/io_destroy.2.html), [io_getevents(2)](https://man7.org/linux/man-pages/man2/io_getevents.2.html)), but this implementation hasn't
yet matured to the point where the POSIX AIO implementation can
be completely reimplemented using the kernel system calls.
Thinking through this... let's say it was implemented with os/open
. Sure, every call to ev/read
would block, but say it's only read in 4kb chunks... one could possibly (ev/sleep 0)
to effect a yield to the event loop that may result in another task having a chance to do some work till it yields to the event loop?
Would the idea of re-implementing slurp
in terms of basically something along the lines of the below make sense? It's more psuedo-code than it is actual code checked for viability.
(while (ev/read file 4096 buf)
(ev/sleep 0))
But what if you build Janet without ev
? What would be the strategy, then?
I'm unsure if Janet-side code has the ability to detect if ev
is present or not. If it did it could conditionally define such perhaps? Or it could be that non-ev-Janet is not as well supported and lacks slurp, leaving people to read the full content of a file in terms of file/read
?
There is this section in janetconf.h
regarding not using ev
:
/* These settings should be specified before amalgamation is
* built. Any build with these set should be considered non-standard, and
* certain Janet libraries should be expected not to work. */
// ... elided ... //
/* #define JANET_NO_EV */
It seems superfluous. TBH, the whole debate is just bikesheding to me. Even the name slurp
sounds like something not too advanced :-).
I don't really understand your comment.
Would you mind elaborating on it?
I meant that I see slurp
as a quick, dirty way to get the file's content, for example, in the throwaway scripts or when loading configuration. When I care about some more optimized version, writing my version tailored to that specific needs is easy.
But most of all, looking at the length of this thread, it is not worth the cause.
Thanks for the explanation.
FWIW, I'm not convinced that the idea mentioned by llmII was not worth continuing to discuss. It seemed a bit premature to stop the conversation.
I want to know what can replace slurp
for asynchronous IO.
(while (ev/read file 4096 buf)
(ev/sleep 0))
Does this yield to the event loop regularly? If IO itself was synchronous, I would still want to yield to the event loop after reading a chunk of 4 kilobytes or something like that.
In real code, I would probably use
(while (ev/read file 4096)
(ev/give ch buf))
or
(while (ev/read file 4096 buf)
(yield buf))
I think ev/give
and yield
yield to the event loop.
@amano-kenji Please note that (yield)
does not yield to the event loop but to whatever resumed the currently running fiber. This is important to note because if a task yields without doing so via an ev/*
call it needs to be resumed via ev/go
or it is done.
The following example code illustrates this working for anyone who comes back to look at this issue in the future, and outlines how slurp
could be implemented to work like the example as well.
Before we begin we should note that for the event loop to be able to schedule something other than the currently running fiber, the currently running fiber must yield to the event loop. It may not seem intuitive but (ev/sleep 0)
is the equivalent of (yield)
, the only difference being where the fiber is yielding to, the former yielding to the event loop, the latter to the section of code that resumed the currently running fiber having called (yield)
.
You'll need to modify the example to use a suitably large file, and change what the file is that it's opening. If you're on *nix something like dd if=/dev/zero of=bad-file bs=1 count=0 seek=1G
may be helpful.
Example:
(def buf @"")
(def tf (ev/spawn
(protect
(forever
(ev/sleep .5)
(print "Here we are @" (length buf))))
(print "Last @" (length buf))))
# optionally use `os/open`, makes no difference
(with [f (file/open "/tmp/bad-file" :r)]
(while (:read f 4096 buf)
(ev/sleep 0)))
(print "We read 1gb!")
(ev/cancel tf :cancel)
Example output:
Here we are @359878656
Here we are @901939200
We read 1gb!
Last @1073741824
Example slurp implementation that won't block tasks as badly as reading all the bytes in one go:
(defn slurp
``Read all data from a file with name `path` and then close the file.``
[path]
(def buf @"")
# optionally use `os/open`, makes no difference
(with [f (file/open path :rb)]
(while (:read f 4096 buf)
(ev/sleep 0)))
buf)
Closing this as the solution posted here should work as expected.
Right now, it uses
file/open
.file/
functions block the event loop.os/open
creates a stream which doesn't block the event loop.