Open ipronin opened 6 years ago
@brendangregg do you remember the history of this?
No, I don't remember... I see the commit history is lacking the comment I put here: https://github.com/iovisor/bcc/pull/240 :-)
So it's from tracex2, https://github.com/torvalds/linux/blob/master/samples/bpf/tracex2_kern.c#L56
But I added "+ 1"s. It looks intentional, but I can't remember why...
Should I fix it and see what breaks? :) The code looked intentional to me as well, but the results were quite unexpected.
Be aware that I don't think any end-user tools use bpf_log2() directly. They all use bpf_log2l(). So I don't think their lack of consistency has been a problem yet. And also note that there are tests for this, eg, tests/python/test_histogram.py, but I'm not sure what they are testing.
I still haven't remembered why it's there (I wish I wrote a comment). Here's my guesses:
A) Reserving the 0th slot for negative values (which will get aggregated as one bucket). A common tracing histogram is the retval of the read() syscall, to show a read() I/O size histogram, but read() can also return a negative ERRNO. I may have been thinking of that and reserved 0 on purpose for that case.
B) I wanted to differentiate 0 from 1, but then didn't finish writing the code. And was meaning to do this:
value | bpf_log2l
0 | 0
1 | 1
2 | 2
4 | 3
And then change how histograms are printed in BPF.print_log2_hist() to do something like this (printing both 0 and 1 on separate lines):
usecs : count distribution
0 -> 0 : 0 | |
1 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
Because currently, the first line shows "0 -> 1". Which matches what bpf_log2l() is doing. I think it would be useful to split "0 -> 1" into separate lines for 0 and 1.
C) I was going to use the 0th slot to store the max index, but ended up not needing it.
D) I hit some 0th compiler error and the +1 was a hack.
Sure, you could try changing it and see what breaks (which could reveal (D)). And maybe making it do (B) makes the most sense.
If it changes, tools should emit the same histograms -- changing the internal offsets shouldn't change the consumer tools.
If it changes, tools should emit the same histograms -- changing the internal offsets shouldn't change the consumer tools.
by that I meant the histograms shouldn't become wrong (off by one). The output might change if we have the new "0 -> 0" and "1 -> 1" lines.
Got hit by this, I was expecting a lot of 0, but when plotting with print_log2_hist
, the number of 0 always amounted to 0 whereas my print calls showed 0s. Quite unexpected.
This has probably been fixed, but on the off chance that it wasn't, I suspect that because bpf_log2 is called within eBPF, it, by definition, cannot fail. logN(0) is undefined and would normally (in user space) raise an exception. My workaround was to check to see if the scaled value was zero prior to passing it to log2. IE. The scaled time delta will be zero if the elapsed time < 1000. For that case, I explicitly specify the zero bucket like so: bucket = svalue ? bpf_log2(svalue) : 0;
When calculating the binary logarithm of a 64-bit integer
bpf_log2l
helper adds 1 to the logarithm of a 32-bit half calculated bybpf_log2
.That doesn't cause errors with code like
dist.increment(bpf_log2l(1ull << 63))
, wheredist
is a log2 histogram of size 64. I suspect that is due to element lookup prior to updating. In real use cases the 0th bucket will always be empty and the 63rd will be lost. Interesting thatbcc.table._print_log2_hist
works around this by printing buckets 1 through 63, skipping the always empty 0th bucket. Was it done intentionally? I can send a patch that fixes this issue if you would like.