openSUSE / kdump

kernel dump helpers
GNU General Public License v2.0
21 stars 24 forks source link

calibrate.conf varies between builds #25

Open bmwiedemann opened 2 years ago

bmwiedemann commented 2 years ago

While working on reproducible builds for openSUSE, I found that our kdump package now varies across builds, because of calibrate.conf

When trying to vary as little as possible, I still get this diff:

--- old//usr/lib/kdump/calibrate.conf   2022-01-21 00:00:00.000000000 +0000
+++ new//usr/lib/kdump/calibrate.conf   2022-01-21 00:00:00.000000000 +0000
@@ -4,21 +4,21 @@
 Compared:       0 xattrs
 Compared:       60 files
 Saved:          888 B
-Duration:       0.003309 seconds
+Duration:       0.003111 seconds
 Mode:           real
 Files:          636
 Linked:         2 files
 Compared:       0 xattrs
 Compared:       60 files
 Saved:          888 B
-Duration:       0.003395 seconds
-KERNEL_BASE=96924
+Duration:       0.003227 seconds
+KERNEL_BASE=96124
 KERNEL_INIT=22956
 INIT_CACHED=36088
 PAGESIZE=4096
 SIZEOFPAGE=64
 PERCPU=264
-USER_BASE=17592
-INIT_NET=4400
+USER_BASE=18196
+INIT_NET=4396
 INIT_CACHED_NET=11224
-USER_NET=3548
+USER_NET=0

What is the purpose of these values? Can they be dropped or fixed?

ptesarik commented 2 years ago

Hi Bernhard,

the purpose of this file is to store an approximation of how much memory is needed by the kernel and user-space tools that are used to save the kernel dump. To my best knowledge, there is no way to calculate them deterministically from the binaries alone, so an actual initrd is built from the currently installed tools and started inside a QEMU VM using the currently installed kernel. The actual memory consumption of the VM is stored in the configuration file as a kind of “footprint” of the target operating system.

Just like there are slight variances in the execution order between any two operating system runs, there are variances in the measured values. Any ideas on improving the algorithm would be much appreciated.

bmwiedemann commented 2 years ago

This is the amount of variation I get when varying the env:

@@ -1,10 +1,10 @@
-KERNEL_BASE=96420
-KERNEL_INIT=23944
-INIT_CACHED=37580
+KERNEL_BASE=97376
+KERNEL_INIT=23880
+INIT_CACHED=37508
 PAGESIZE=4096
 SIZEOFPAGE=64
 PERCPU=264
-USER_BASE=18724
-INIT_NET=4804
-INIT_CACHED_NET=11816
-USER_NET=2908
+USER_BASE=18860
+INIT_NET=4800
+INIT_CACHED_NET=11808
+USER_NET=2876

One valid approach could be to round values to the next 2^n or 4k and that could get rid of the entropy in the lower bits. What is the unit of these numbers? byte or KB?

ptesarik commented 2 years ago

All these values are in kilobytes, so they are already rounded to 4k (or 64k on IBM POWER). Rounding to the next 2^n is too wasteful for small systems, and it may not even solve the issue. What if the value varies between 4088 and 4104, for example?

But I've got another idea. We could package calibrate.conf as a source file and merely check that the newly built one does not vary by more than a certain percentage, though I'm not entirely sure what should happen if it does:

Is there anything like a build warning? I mean, the package should still build, but the maintainers should get a hint (or even notification) that the sources need an update. This is especially important for Tumbleweed.

bmwiedemann commented 2 years ago

Pre-generating is certainly one solution.

another idea: maybe disabling ASLR during the qemu run could help?

a build log entry is the closest to a warning if you do not want to fail the build, which would be more obvious.

ptesarik commented 2 years ago

Thanks for the idea. The variance is certainly not caused by kASLR, because that's already off, see commit 8cc0a84ede95.

Hm, a build log entry… I was hoping for something more visible, like rpmlintrc warnings are prominent in the web UI. OK, then I might just skip building the config file completely and instead add an openQA test case.

ptesarik commented 2 years ago

I have made the generation of calibrate.conf optional with the CALIBRATE CMake option (see commit ed16cab5781d2c9a0cfbc05d9304b80c748660f5).

I have added a --with-calibrate option to the OBS package. This makes the build process skip the QEMU run by default and use stored values from kdump-calibrate.tar.bz2 instead. The option can be enabled temporarily to refresh the values. See sr#950672.

The remaining task is to create an openQA test case that the stored values are not off by too much. That's why I'm keeping this issue open, but it is expected that your original concern is solved for now. Please, confirm.