Closed yaoyaoio closed 1 year ago
我的英文不太好 语法有可能有问题。 简单用中文复述一下
我在阿里云集群上复现这个问题
agent虽然成功将 starship/linux_headers/linux-headers-4.14.304-starship.tar.gz
解压到 /usr/src/linux-headers-4.14.304-starship
。但是在进行modifyKernelVersion出了错。usr/src/linux-headers-4.14.304-starship
变成了 usr/src/linux-headers-4.14.48-starship
.
原因就是 parseVersion 函数 和 Version{} 会把304 变成 48. 因为 304 超出了 uint8范围。
@yaoyaoio Thanks for the careful testing
We seem are having an incomplete understanding of the current kernel versioning landscape. Across kernel source code, macro like
#define KERNEL_VERSION(a,b,c) ((a)*65536+(b)*256+(c))
// Or other variants
#define KERNEL_VERSION (((a) << 16) + ((b) << 8) + (c))
Are used to compute a code for comparison of Linux kernel versions:
/* Scheduale bottom half to run */
#if LINUX_VERSION_CODE > KERNEL_VERSION(2,2,0)
queue_task(&task, &tq_immediate);
#else
queue_task_irq(&task, &tq_immediate);
#endif
mark_bh(IMMEDIATE_BH);
That means that the minor version bit size shall be 8, so to make the above macro meaningful. Otherwise the above code risks truncating. But in reality there are indeed minor version that is beyond 8-bit range. For example:
This issue is discussed here as well: https://lwn.net/Articles/845120/
The conclusion seem to suggest they just keep increasing kernel version minor code, but do not bother to update code computation.
I am guessing if everything works fine if we use uint16 for minor-version, then we should just change that. We only need to make sure the code that rely on code of the kernel version, still works as expected for truncated minor versions.
@owl-ltt What do you think?
I read this article before :https://blog.dcjanus.com/posts/linux-version-number-and-tikv-panic/
Yeah, I think it makes sense which changes the minor version to uint16
, I will change it, please wait for my PR.
@yaoyaoio Thank you for your report, I have tried to fix this issue, Can you help me to test it?
@owl-ltt Yes I can. I will test it this afternoon.
I used image: ghcr.io/tricorder-observability/agent:7e492c318cf9d9ab0c16a97a1c30579288499066
agent run successfully..
I think we should add Info log output example:
log.Infof("Linux headers initialized successfully for bcc: %v ", linux_headers.GetVersion())
log.Infof("Connected to API server at '%s'", *apiServerAddr)
Provider some success logs.
you can create PR to add this. @yaoyaoio
Describe the bug I deployed Starship in the ALIYUN ACK Cluster. Starship agent running fail.
Env: ACK Version : ALIYUN ACK Pro / ALIYUN ACK 标准版 OS: Centos 7.9(3.10.0-1160.80.1.el7. ×86_64) / Alibaba Cloud Linux 2(4.19.91-26.6.al7.x86_64)
Failed message:
To Reproduce
Initialize Linux headers Step:
/starship/linux_headers/linux-headers-4.14.304-starship.tar.gz
to/usr/src/linux-headers-4.14.304-starship
ok/us/src/linux-headers-4.14.48-starship/include/generated/uai/linux/version.h
Code: locateAndInstallPackageHeadersI think there is a problem with version parsing
Bacause used uint8(int(304)) = 48
I suggest modifying Version{} uint8 to uint16 or not using linux headers with minor > 255