sustrik / libmill

Go-style concurrency in C
MIT License
2.74k stars 204 forks source link

Make regular files switch context #158

Open paulofaria opened 8 years ago

paulofaria commented 8 years ago

Regular files don't switch context the way they're implemented now. The reason is because regular files ignore O_NONBLOCK. So they never return EAGAIN or EWOULDBLOCK. So since we can't use O_NONBLOCK, an approach we could take is to create a thread pool, dispatch the work and then when the work is done we deliver the result. We need a way to fdwait so for that we can use eventfd as it was used for dns resolution in the past. What do you think about this approach?

sustrik commented 8 years ago

Combining libmill with threads is something I would not recommend. While it may work with some pthread implementations it may not with others. In more concrete terms, pthreads may make assumptions about the layout of the call stack which are violated by libmill's 'hand-crafted' stacks.

Another option would be using aio + signals, but I don't have much experience with that. For example, what happens if the signal queue is full? Are signals dropped or what?

paulofaria commented 8 years ago

How was the thread here different? Is it something like kernel vs user space thread?

sustrik commented 8 years ago

It's not different. That's why I've replaced the code with something more sane.

If you want to give the thread approach a try, go for it. It may not work on all platforms though.

paulofaria commented 8 years ago

I'll give it a shot, then. (:

paulofaria commented 8 years ago

@raedwulf have any suggestions about this?

raedwulf commented 8 years ago

A coincidence! I read up on this yesterday. Currently the best way for this is a thread pool sadly. I think there was some lwn article on this (I'll find a link when i finish my job interview). Tldr; linux and posix offer aio so can be used for non blocking but they have caveats in platform compatibility and Linux aio can still block in some cases. We'll need to test the alternatives (posix or otherwise) but we might have to use kernel threads some of the supported platforms.

paulofaria commented 8 years ago

I read that aio implementations are really bad in most kernels. So the thread pool would be the best way. Have any idea if lwan implements context switching for regular files? I would hope so, if it's so performant.

raedwulf commented 8 years ago

I don't recall seeing any implementation of file i/o context switching. I suspect that they use separate threads to read files and then cache it. Lwan has manual coroutines mixed with pthreads.

raedwulf commented 8 years ago

I was wrong - lwan uses mmap for small files. https://tia.mat.br/posts/2012/10/14/vectored_i_o_with_mmap___to_serve_files.html

johneh commented 8 years ago

Are we talking about offloading regular file operations to worker threads? I spent some time experimenting with the idea (and thread-local poller loop with __thread etc.). AFAIR, used pipe for communication (task queue). Writing to a pipe is atomic up to size PIPE_BUF, and read requires a spinlock. If we are transferring a pointer, it really isn't an issue. Regarding pthread, certain routines like pthread_once, pthread_exit and the cleanup routines may be problematic for libmill, and therefore should probably be avoided. If there is no need/plan to launch a goroutine in a non-main thread, I don't see any problem with using (worker) threads at all.

If anyone is interested, I can upload the code to my github account. I have no plan to work on this anytime soon, and I suspect it isn't quite suitable in its current state for inclusion in libmill.

(I don't have access to the computer with the code now, but hope to get it back sometime this weekend.)

paulofaria commented 8 years ago

If there is no need/plan to launch a goroutine in a non-main thread, I don't see any problem with using (worker) threads at all.

Yeah, my ideas is that the threads would be used just to offload regular file operations to worker threads. You think there wouldn't be a problem with the stack in this scenario? Btw, I'd love to see the code.

johneh commented 8 years ago

The start_routine uses the stack (created by pthread_create) of the new thread. Unless I am wrong about that, I don't see any issue.

raedwulf commented 8 years ago

Relevant for Linux: https://lwn.net/Articles/671649/

paulofaria commented 8 years ago

yeah, looks like Linus really hates the aio interface. in the future we might have kernel support for this. but meanwhile, which approach should we use? @johneh let us know when you find the code.

raedwulf commented 8 years ago

I was thinking what about a mfork-based design with a POSIX shm-backed channel (synchronised using POSIX sem) to communicate with a separate I/O process. Thoughts on this @sustrik?

sustrik commented 8 years ago
  1. Start with UNIX domain socks instead of shmem. It's easier and sufficient to find out whether the design actually works.
  2. This looks like a pretty heavy-weigth thing. Would it make sense to keep it in a different library?
raedwulf commented 8 years ago
  1. Yes, that's true - but that gives the overhead of both files and sockets.
  2. Maybe, it feels like the IPC channel is something that nanomsg provides/intends to provide.

In the end we need to have a proper real-world benchmark program where the different approaches can be compared because it's all theoretical at the moment. We don't know if asynchronous file I/O in a different thread would give a significant advantage. For instance, lwan's mmap approach bypasses the issue through OS caching. So are blocking/non-blocking files actually important for real-world applications apart simply making sure libmill semantics are consistent?

A side note, has anyone tested shm-based channel IPC performance? I recall it was one of the goals of nanomsg?

EDIT: "libmill semantics are correct" -> "libmill semantics are consistent". I think the first order to be correct is to correct the documentation so that users know file I/O is never non-blocking in the current state.

johneh commented 8 years ago

I have uploaded my experiment here: https://github.com/johneh/libmill_worker

See the files worker.c and pipe.c; pipe.c code is an abstraction using regular pipe to create an inter-thread communication channel. Examples/tests are in the examples sub-directoy. Other than the files fsop.c, mcp.c and du3.c (translation of a go program from the book by Kernighan et al.), the rest can be ignored.

Please use this recipe: ./configure make install cd examples make

The default configure prefix is pwd, the libs will be installed under /lib in the source directory.

sustrik commented 8 years ago

@readwulf: Right, the docs should be updated. I am working 12hr shifts this week so I'll probably won't have time to do that, but patches are welcome.

sustrik commented 8 years ago

johneh@: It looks like pipe can be made into an actual publicly visible object exported by libmill?

johneh commented 8 years ago

@sustrik: That was the original intention.

sustrik commented 8 years ago

Warning added to the docs.

johneh commented 8 years ago

It seems closing the write end of a pipe can be a difficult business. If the fd is closed in one thread then other writer threads waiting in poll/epoll will not unblock. One solution is to use one pipe per thread and multiplex the received values onto a single channel. I have added an example (fanin.c). No mutex and/or cond. var required!