Need IO record and replay facility

xemul commented 3 months ago

When checking IO-related issues on prod. nodes we only have metrics and some Linux tooling at hand. Metrics show averaged over several minutes counters, while Linux tooling shows the result of IO scheduler's work. To get better understanding of what's going on with seastar IO stack, we need to understand how the requests are submitted by uppoer code (read -- Scylla code) to seastar, not by seastar to Linux/disk.

One of the tools to help that can be IO tracer, that collects info about requests that are queued to IO scheduler. Very important parameter here is submission timestamp, because currently IO scheduler is built with the idea of uniform workloads in mind. Non-uniform input is handled, but the understanding of how exactly "non-uniform" it is is missing.

avikivity commented 2 months ago

Isn't the input to the I/O scheduler dependent on the output? For example, a sequential workload's submit timestamps will depend on when the requests are completed (even with read-ahead or write-behind).

xemul commented 2 months ago

Isn't the input to the I/O scheduler dependent on the output?

Well, in math sense yes, scheduler defines output = sched_fn(input) so in theory we could deduce a reverse function for input = ~sched_fn(output), but I don't have good ideas how to do it.

For example, a sequential workload's submit timestamps will depend on when the requests are completed (even with read-ahead or write-behind).

No, to get submit timestamp from complete timestamp, we need execution time, but it's not cut in stone. Quite opposite, we measure execution from from submit and complete timestamps.

scylladb / seastar

Need IO record and replay facility #2403