iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.34k stars 3.86k forks source link

bpf_log2l helper gives an off by one result #1649

Open ipronin opened 6 years ago

ipronin commented 6 years ago

When calculating the binary logarithm of a 64-bit integer bpf_log2l helper adds 1 to the logarithm of a 32-bit half calculated by bpf_log2.

value               | bpf_log2 | bpf_log2l
0                   | 0        | 1
1                   | 0        | 1
2                   | 1        | 2
4                   | 2        | 3
4294967296          |          | 33
8589934592          |          | 34
9223372036854775808 |          | 64

That doesn't cause errors with code like dist.increment(bpf_log2l(1ull << 63)), where dist is a log2 histogram of size 64. I suspect that is due to element lookup prior to updating. In real use cases the 0th bucket will always be empty and the 63rd will be lost. Interesting that bcc.table._print_log2_hist works around this by printing buckets 1 through 63, skipping the always empty 0th bucket. Was it done intentionally? I can send a patch that fixes this issue if you would like.

yonghong-song commented 6 years ago

@brendangregg do you remember the history of this?

brendangregg commented 6 years ago

No, I don't remember... I see the commit history is lacking the comment I put here: https://github.com/iovisor/bcc/pull/240 :-)

So it's from tracex2, https://github.com/torvalds/linux/blob/master/samples/bpf/tracex2_kern.c#L56

But I added "+ 1"s. It looks intentional, but I can't remember why...

ipronin commented 6 years ago

Should I fix it and see what breaks? :) The code looked intentional to me as well, but the results were quite unexpected.

brendangregg commented 6 years ago

Be aware that I don't think any end-user tools use bpf_log2() directly. They all use bpf_log2l(). So I don't think their lack of consistency has been a problem yet. And also note that there are tests for this, eg, tests/python/test_histogram.py, but I'm not sure what they are testing.

I still haven't remembered why it's there (I wish I wrote a comment). Here's my guesses:

A) Reserving the 0th slot for negative values (which will get aggregated as one bucket). A common tracing histogram is the retval of the read() syscall, to show a read() I/O size histogram, but read() can also return a negative ERRNO. I may have been thinking of that and reserved 0 on purpose for that case.

B) I wanted to differentiate 0 from 1, but then didn't finish writing the code. And was meaning to do this:

value               | bpf_log2l
0                   | 0
1                   | 1
2                   | 2
4                   | 3

And then change how histograms are printed in BPF.print_log2_hist() to do something like this (printing both 0 and 1 on separate lines):

     usecs               : count     distribution
         0 -> 0          : 0        |                                        |
         1 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |

Because currently, the first line shows "0 -> 1". Which matches what bpf_log2l() is doing. I think it would be useful to split "0 -> 1" into separate lines for 0 and 1.

C) I was going to use the 0th slot to store the max index, but ended up not needing it.

D) I hit some 0th compiler error and the +1 was a hack.

Sure, you could try changing it and see what breaks (which could reveal (D)). And maybe making it do (B) makes the most sense.

If it changes, tools should emit the same histograms -- changing the internal offsets shouldn't change the consumer tools.

brendangregg commented 6 years ago

If it changes, tools should emit the same histograms -- changing the internal offsets shouldn't change the consumer tools.

by that I meant the histograms shouldn't become wrong (off by one). The output might change if we have the new "0 -> 0" and "1 -> 1" lines.

teto commented 6 years ago

Got hit by this, I was expecting a lot of 0, but when plotting with print_log2_hist, the number of 0 always amounted to 0 whereas my print calls showed 0s. Quite unexpected.

sftwrngnr commented 5 years ago

This has probably been fixed, but on the off chance that it wasn't, I suspect that because bpf_log2 is called within eBPF, it, by definition, cannot fail. logN(0) is undefined and would normally (in user space) raise an exception. My workaround was to check to see if the scaled value was zero prior to passing it to log2. IE. The scaled time delta will be zero if the elapsed time < 1000. For that case, I explicitly specify the zero bucket like so: bucket = svalue ? bpf_log2(svalue) : 0;