Closed DelusionalOptimist closed 1 year ago
Can I work on this?
Hey @Amishakumari544, sure. Can you start out by trying to reproduce this on your system and analyzing what might be happening?
yes sure, I will first try to see what happens in my system.
i found all "Result":"Unknown error"
,the data is "Data": "kprobe=tcp_accept
,maybe they are related.
== Log / 2023-03-16 12:44:07.599978 ==
ClusterName: default
HostName: yyj-test-node1
NamespaceName: cert-manager
PodName: cert-manager-webhook-8bc4cf7d8-r659d
Labels: app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=webhook,app.kubernetes.io/version=v1.8.0,app=webhook,app.kubernetes.io/component=webhook
ContainerName: cert-manager
ContainerID: 77d19de9a3e6cb351699208ef231ba5f9f7ad56893e2149230ae4b1f08b90c77
ContainerImage: sealos.hub:5000/jetstack/cert-manager-webhook:v1.8.0@sha256:fd798a5a773e5a69880caa60f050d3ffd237c4c3ec3289aacaf59f938c8d7668
Type: ContainerLog
Source: /app/cmd/webhook/webhook
Resource: remoteip=10.35.128.65 port=6080 protocol=TCPv6
Operation: Network
Data: kprobe=tcp_accept domain=AF_INET6
Result: Unknown error
HostPID: 4.1131e+06
HostPPID: 4.105252e+06
PID: 1
PPID: 4.105252e+06
ParentProcessName: /usr/bin/containerd-shim-runc-v2
ProcessName: /app/cmd/webhook/webhook
UID: 1000
May be know errorno can solve the problem ? @DelusionalOptimist
Thank you for looking into this @xiao-jay. :rocket:
Yes, getting the error code in logs would definitely help us debug this. However if you notice, even though the requests pass successfully, we're still getting a negative error code. This shouldn't happen as passed commands have a 0
error code.
So the problem here might be related to the data captured for tcp_accept
syscall in system monitor.
@DelusionalOptimist If I want to build the newest code,and run karmor logs --logFilter=all --operation=Network
to get the newest result. Please tell me how to do.
@xiao-jay The best way to test your changes to kubearmor is:
kubectl proxy &
make run
in the KubeArmor directory.This should run kubearmor locally for you. Also, we generally prefer developing with kubearmor in vagrant env which takes care of setting all the dependencies. See - https://github.com/kubearmor/KubeArmor/blob/main/contribution/development_guide.md
Once you have that, you should be able to reproduce this issue.
@DelusionalOptimist kubectl get pod
I get many error from calico,the error code is very big
root@yyj-test-master1:/home/ubuntu/kubearmor/KubeArmor/KubeArmor/monitor# karmor logs --logFilter=all --operation=Network|grep Unknown -C 10
local port to be used for port forwarding kubearmor-relay-64c6fff875-vq6jt: 32825
Created a gRPC client (localhost:32825)
Checked the liveness of the gRPC server
Started to watch alerts
Started to watch logs
PodName: calico-node-7ww2h
Labels: k8s-app=calico-node
ContainerName: calico-node
ContainerID: 893549bf092886a768822fadb3391c759db757826231301bc47dd655445e4dcb
ContainerImage: sealos.hub:5000/calico/node:v3.22.1@sha256:1f8ed83e5264b4206cce7e1def11bca0b3ea7d5f4eb9b0ca0dbfc8cb968ca57e
Type: ContainerLog
Source: /usr/bin/calico-node
Resource: remoteip=127.0.0.1 port=9099 protocol=TCP
Operation: Network
Data: kprobe=tcp_accept domain=AF_INET
Result: Unknown (-129904102617408)
HostPID: 114089
HostPPID: 114074
PID: 87
PPID: 72
ParentProcessName: /usr/local/bin/runsv
ProcessName: /usr/bin/calico-node
== Log / 2023-03-21 13:11:10.249191 ==
ClusterName: default
HostName: yyj-test-master1
Type: HostLog
--
PodName: calico-typha-5d687fb4d7-c4v88
Labels: k8s-app=calico-typha
ContainerName: calico-typha
ContainerID: 36f6878dfe152b239a5f0b8608dad00b89fa8bb0915d4c1bf260b93bb44003ba
ContainerImage: sealos.hub:5000/calico/typha:v3.22.1@sha256:e36532ff56568fc324fa93f3e0a0005f899564ab72903c035f8f4d4e9378b6b9
Type: ContainerLog
Source: /code/calico-typha
Resource: remoteip=127.0.0.1 port=9098 protocol=TCP
Operation: Network
Data: kprobe=tcp_accept domain=AF_INET
Result: Unknown (-129904102630848)
HostPID: 113684
HostPPID: 113671
PID: 7
PPID: 1
ParentProcessName: /sbin/tini
ProcessName: /code/calico-typha
UID: 999
== Log / 2023-03-21 13:11:10.253316 ==
ClusterName: default
HostName: yyj-test-master1
--
PodName: calico-typha-5d687fb4d7-c4v88
Labels: k8s-app=calico-typha
ContainerName: calico-typha
ContainerID: 36f6878dfe152b239a5f0b8608dad00b89fa8bb0915d4c1bf260b93bb44003ba
ContainerImage: sealos.hub:5000/calico/typha:v3.22.1@sha256:e36532ff56568fc324fa93f3e0a0005f899564ab72903c035f8f4d4e9378b6b9
Type: ContainerLog
Source: /code/calico-typha
Resource: remoteip=127.0.0.1 port=9098 protocol=TCP
Operation: Network
Data: kprobe=tcp_accept domain=AF_INET
Result: Unknown (-129904435649792)
HostPID: 113684
HostPPID: 113671
PID: 7
PPID: 1
ParentProcessName: /sbin/tini
ProcessName: /code/calico-typha
UID: 999
== Log / 2023-03-21 13:11:10.417936 ==
ClusterName: default
HostName: yyj-test-master1
--
PodName: calico-node-7ww2h
Labels: k8s-app=calico-node
ContainerName: calico-node
ContainerID: 893549bf092886a768822fadb3391c759db757826231301bc47dd655445e4dcb
ContainerImage: sealos.hub:5000/calico/node:v3.22.1@sha256:1f8ed83e5264b4206cce7e1def11bca0b3ea7d5f4eb9b0ca0dbfc8cb968ca57e
Type: ContainerLog
Source: /usr/bin/calico-node
Resource: remoteip=127.0.0.1 port=9099 protocol=TCP
Operation: Network
Data: kprobe=tcp_accept domain=AF_INET
Result: Unknown (-129902512805632)
HostPID: 114089
HostPPID: 114074
PID: 87
PPID: 72
ParentProcessName: /usr/local/bin/runsv
ProcessName: /usr/bin/calico-node
== Log / 2023-03-21 13:11:20.257554 ==
ClusterName: default
HostName: yyj-test-master1
Type: HostLog
@xiao-jay right, which is the problem we're trying to solve.
You're getting these logs from Calico as it is making tcp_accept
syscalls. The syscall pass successfully so these logs should be having Result: Passed
. However the logs are having this big error code instead.
You can also tryout by sending a curl request to a server. (try nginx running in your setup)
When the server accepts your requests it'll create a tcp_accept
syscall and kubearmor will capture it, and show it in the logs.
@xiao-jay right, which is the problem we're trying to solve. You're getting these logs from Calico as it is making
tcp_accept
syscalls. The syscall pass successfully so these logs should be havingResult: Passed
. However the logs are having this big error code instead. You can also tryout by sending a curl request to a server. (try nginx running in your setup) When the server accepts your requests it'll create atcp_accept
syscall and kubearmor will capture it, and show it in the logs.
Thank you for your suggestion,i will try it now.
I try exec curl localhost:8000
at multiubuntu pod inside,but get tcp_connect and passed result.For this very big retva number I found some regularity. If you convert them to binary, they have 48 bits, the first 16 bits are same.
-11101100010010110110111000010110011101000000000
-11101100010010110100110101000010101110100000000
-11101100010010101001001000100011100001011000000
I am not familar with BPF and the c code.So it's hard for me to solve the problem single.
I found the code about accopt, found context.retval = PT_REGS_RC(ctx);
,retval is set by PT_REGS_RC(ctx)
func ,but I not found the PT_REGS_RC func info, so could you please give me some help for the c code and BPF. @DelusionalOptimist
code in system_monitor.c
cc @daemon1024
@xiao-jay I'm not really sure about the above. In the meantime, you should try looking into https://github.com/kubearmor/KubeArmor/pull/1087. It seems to be working fine before that.
Thank you for your suggestion,I will try to find out the reason.
Hey @xiao-jay were you able to find anything?
/assign
https://github.com/kubearmor/KubeArmor/blob/4a0d0af2a5d647bf54e7069c4a2ce81e3bacdc09/KubeArmor/BPF/system_monitor.c#L1873
Here the function is instrumented to be a kprobe
but using PT_REGS_RC(ctx)
which is only supported in kretprobe
. I suspect this could be the reason for the uninitialized retvalue in the monitored events.
CC: @DelusionalOptimist
@stefin9898 Makes sense, we should be extracting RC from kretprobe only and send to perf buffer from kretprobe.
@daemon1024 I tried marking the function as a kretprobe still I see the retval as something uninitialized. Edit: At this stage https://github.com/kubearmor/KubeArmor/blob/4a0d0af2a5d647bf54e7069c4a2ce81e3bacdc09/KubeArmor/BPF/system_monitor.c#L1838 would it be safe to assign the retvalue to 0 manually or would that be a NO ?
Here the function is instrumented to be a
kprobe
but usingPT_REGS_RC(ctx)
which is only supported inkretprobe
. I suspect this could be the reason for the uninitialized retvalue in the monitored events. 这里函数被检测为kprobe
但使用PT_REGS_RC(ctx)
仅在kretprobe
中受支持。我怀疑这可能是监视事件中未初始化重值的原因。CC: @DelusionalOptimist 抄送: @DelusionalOptimist
@stefin9898 this error only appear in tcp_accept,your code url use in tcp_connect,may be the right code is (https://github.com/kubearmor/KubeArmor/blob/4a0d0af2a5d647bf54e7069c4a2ce81e3bacdc09/KubeArmor/BPF/system_monitor.c#L1946
because this use _TCP_ACCEPT
https://github.com/kubearmor/KubeArmor/blob/4a0d0af2a5d647bf54e7069c4a2ce81e3bacdc09/KubeArmor/BPF/system_monitor.c#L1953
@xiao-jay I'm not really sure about the above. In the meantime, you should try looking into #1087. It seems to be working fine before that.
I know why after this pr will appear error
Hey @xiao-jay were you able to find anything?
I know nothing about ebpf,Please ask @stefin9898 to help solve this problem.I hope my previous comments can help you.
@xiao-jay Thanks a lot for pointing to the right direction. I checked the implementation
It doesn't return error code directly, We might need to extract the error code in a different way from the function. That's why we are seeing ambigous values as Return Codes. @xiao-jay This helps a lot, Thank you again.
Bug Report
General Information
uname -a
) - 5.15.0-46-generickubectl version
, ...) - 1.23.17To Reproduce
Network
operation logs usingkarmor logs --logFilter=all --operation=Network
.curl localhost:8000
which would create aTCP_ACCEPT
syscall by the running python serverResult: Unknown error
in logs even though the request passes.Expected behavior The log should have
Result: Passed
instead.Logs