Open bobrik opened 7 years ago
If I do 4k writes with no pause between them:
$ sudo go run /tmp/paranoid.go -file=/dev/sdb -size=4096 -sleep=0ms
...
2017/03/30 04:29:40 Synced in 10031 μs | 10.031391ms
2017/03/30 04:29:40 Synced in 10050 μs | 10.050549ms
2017/03/30 04:29:41 Synced in 10040 μs | 10.04077ms
2017/03/30 04:29:41 Synced in 10054 μs | 10.054079ms
2017/03/30 04:29:41 Synced in 10032 μs | 10.032228ms
2017/03/30 04:29:41 Synced in 10036 μs | 10.036565ms
2017/03/30 04:29:41 Synced in 10047 μs | 10.04728ms
2017/03/30 04:29:41 Synced in 10021 μs | 10.021358ms
2017/03/30 04:29:41 Synced in 10064 μs | 10.064109ms
2017/03/30 04:29:41 Synced in 10032 μs | 10.032702ms
2017/03/30 04:29:41 Synced in 10007 μs | 10.007492ms
2017/03/30 04:29:41 Synced in 10277 μs | 10.277303ms
In biosnoop
I can clearly see 10ms pause between issued writes:
67.830680000 paranoid 3676 sdb W 6024 4096 0.66
67.840708000 paranoid 3676 sdb W 6032 4096 0.65
67.850777000 paranoid 3676 sdb W 6040 4096 0.65
67.860824000 paranoid 3676 sdb W 6048 4096 0.64
67.870924000 paranoid 3676 sdb W 6056 4096 0.65
67.880957000 paranoid 3676 sdb W 6064 4096 0.65
67.891021000 paranoid 3676 sdb W 6072 4096 0.67
67.901027000 paranoid 3676 sdb W 6080 4096 0.64
67.911102000 paranoid 3676 sdb W 6088 4096 0.65
67.921144000 paranoid 3676 sdb W 6096 4096 0.64
67.931236000 paranoid 3676 sdb W 6104 4096 0.65
67.941263000 paranoid 3676 sdb W 6112 4096 0.63
Why kernel decides it's ok to have slower syncs when there are pauses between syncs is an interesting question too, but latencies from biosnoop
are still not right.
Code I use:
package main
import (
"flag"
"log"
"os"
"time"
)
func main() {
name := flag.String("file", "", "filename to use")
size := flag.Int("size", 1024*10, "chunk size")
sync := flag.Int("sync", 1, "sync file after that many writes, <= 0 to disable")
sleep := flag.Duration("sleep", time.Second, "time to sleep between writes")
count := flag.Int("count", 0, "number of iterations, <=0 for infinity")
flag.Parse()
if *name == "" {
flag.PrintDefaults()
}
file, err := os.OpenFile(*name, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0666)
if err != nil {
log.Fatalf("Error opening file %q: %s", *name, err)
}
buf := make([]byte, *size)
i := 0
for {
i++
started := time.Now()
verb := "Written"
if _, err := file.Write(buf); err != nil {
log.Fatalf("Error writing: %s", err)
}
if *sync > 0 && i%*sync == 0 {
if err := file.Sync(); err != nil {
log.Fatalf("Error syncing: %s", err)
}
verb = "Synced"
}
elapsed := time.Since(started)
log.Printf("%s in %d μs | %v", verb, int(elapsed.Seconds()*1e6), elapsed)
if *count > 0 && i >= *count {
break
}
time.Sleep(*sleep)
}
}
/cc @brendangregg
I issue 10 x 512 byte writes each followed by fsync directly to /dev/sdb that is a spinning disk and here's what I see:
biosnoop (0.20ms):
$ sudo /usr/share/bcc/tools/biosnoop | fgrep --line-buffered sdb 0.000000000 paranoid 17553 sdb R 0 4096 0.23 0.000329000 paranoid 17553 sdb W 0 4096 0.20 4.338511000 paranoid 17553 sdb W 0 4096 0.27 5.361153000 paranoid 17553 sdb W 0 4096 0.26 6.383771000 paranoid 17553 sdb W 0 4096 0.26 7.406433000 paranoid 17553 sdb W 0 4096 0.27 8.429084000 paranoid 17553 sdb W 0 4096 0.26 9.451788000 paranoid 17553 sdb W 0 4096 0.26 10.474409000 paranoid 17553 sdb W 0 4096 0.26
Looks good to me.
It would be nice for it to include the flush events as well, but those kprobes don't in this case.
Edit: in case I'm not clear, I mean the biosnoop output should contain these as separate lines of output: one for the [W]rite event (0.25ms), and then the [S]ync event (10ms).
This reminds me of #826. Tracepoints cover the flush/sync events, and the bio tools should be rewritten to use them, at which point we'll just get both events for free. I haven't been in a hurry because it'll break everyone on 4.4, until they get to 4.7 (or discover the older bio tools in tools/old). So if the bio tools need minimum fixes for kprobes, we can continue doing that, and then later switch to tracepoints. But if the bio* tools become too badly broken, it might be better to bite the bullet and switch to tracepoints early...
I think I can guess why it's not seeing syncs: for syncs, blk_start_request() uses a different struct request *req, which doesn't get paired up on completion.
biolatency with queuing time (16-32ms):
Tracing block device I/O... Hit Ctrl-C to end. ^C 22:29:34 disk = 'sdb' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 1 | | 128 -> 255 : 64 |****************************************| 256 -> 511 : 9 |***** | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 0 | | 8192 -> 16383 : 0 | | 16384 -> 32767 : 10 |****** |
In this case it's seeing both the writes and the syncs, and they are getting paired up correctly (same struct request on issue and completion).
biosnoop is missing the syncs. It's either fixable with tracepoints (#826) or finding the right kprobe.
Something funny is happening in
biosnoop
andbiolatency
when asked to exclude kernel queuing.I issue 10 x 512 byte writes each followed by
fsync
directly to/dev/sdb
that is a spinning disk and here's what I see:biosnoop
(0.20ms):biolatency
without queueing time (0.1..02ms):biolatency
with queuing time (16-32ms):Note that here I see my exact 10 requests, but I don't see that without queuing.
If I write to a filesystem, latency numbers from
biosnoop
look more realistic:But then again, my app now sees 42ms latency because it has to write two sectors on disk.
This is happening on 4.4.44.
I also see similar weirdness on SSDs when reported latency is 0.09ms + 0.04ms (that's on a filesystem), but app perceives that as 1.2ms (0.09+0.04 is 0.13ms, 10x difference).