rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.08k stars 577 forks source link

Support `io_uring` #2613

Open rocallahan opened 4 years ago

rocallahan commented 4 years ago

Here's a possible approach.

This won't be very fast, since in many cases it will mean more io_ring_enter syscalls than without rr, and all io_uring_enter syscalls will require trapping to rr (i.e. 4 context switches), but if the submission queue is large then we will batch a lot of I/O operations per trap --- a bit like syscallbuf. (Trying to integrate io_uring with syscallbuf seems pointless since we get the batching effect as-is. If necessary we could make the real buffers bigger than the fake buffers.) So performance might be close to as good as one could expect.

This assumes application threads don't race with the kernel's writes to user-space I/O buffers. If we don't want to assume that, we can extend this to allocate additional scratch buffers, rewrite submission-queue entries to point to those buffers, and copy the contents of those buffers to the right place when we see new completion queue entries.

asm89 commented 3 years ago

We are starting to run into binaries we want to record that use io_uring. Are there any (active) plans to add support for io_uring in rr?

khuey commented 3 years ago

Are you running into binaries that would work fine if rr returned ENOSYS for the io_uring syscalls and they fell back to whatever they used before, or are you running into binaries that need io_uring to be supported in rr?

asm89 commented 3 years ago

I had a quick look. I think most binaries would work if rr returned ENOSYS. Recording is currently blocked because an internal assert in rr fires.

That said, we'd also be interested in io_uring actually being supported. Most systems our (test) binaries run on support io_uring, so where probed I'm guessing it's the most commonly used backend at this point.

rocallahan commented 3 years ago

Can you tell us more about what you work on?

asm89 commented 3 years ago

Sorry for the delay in reply. I was afk for a bit.

I work in the continuous integration space at Facebook. We are trialing rr in our developer and CI environments. We have binaries and tests using io_uring. The assert firing is a blocker for further rollout.

Returning ENOSYS would unblock things for now. It would mean rr recorded executions use a different I/O backend which wouldn't be ideal long term.

rocallahan commented 3 years ago

Alright, 7854be5362baadc0143b956279e96f3c4f511dfa makes io_uring return ENOSYS for now.

vlovich commented 11 months ago

Grr.. I'm working with binaries that don't have any fallback and only have io_uring as a backend...

rocallahan commented 11 months ago

If it's important to you, you could try contracting @khuey to do it.