Closed mariomac closed 7 months ago
Sounds great, just mark it when it's ready for review :)!
Attention: 49 lines
in your changes are missing coverage. Please review.
Comparison is base (
2c2ea40
) 80.84% compared to head (656d857
) 80.56%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@grcevski logging the received traces at the client side, I see that neither the Host PID nor the User PID aren't recognized, so that's why the traces are filtered out.
It's strange because the reported PIDs are always 13 (on my local docker) and there isn't any process with that ID. Maybe we aren't setting them correctly or we are pointing to a wrong memory zone in the BPF side?
@grcevski logging the received traces at the client side, I see that neither the Host PID nor the User PID aren't recognized, so that's why the traces are filtered out.
That's interesting! The eBPF side let it go now, which means we matched it correctly based on the namespaced PID on the eBPF side, produced the event, but the user space is filtering it out. Likely the pids we collect at event generation are not right.
I also noticed one more issue at the time we run the process discovery loop. We record the active port numbers, but with client programs they will grow to large arrays because it seems we are capturing the ephemeral ports too. See how the list grows in the output:
msg="new process watching events" component=discover.Watcher interval=500ms events="[{Type:0 Obj:{pid:1 openPorts:[33348 33400 33332 33318 33388 33312 33372 33356 33324 40604 40608 33338]}}]"
OK, I reproduced it with some extra debugging information. For me it seems the node client is spawning new processes to handle the outgoing request:
msg="Found event" component=httpfltr.Tracer "Host PID"=1265 "NS Pid"=1265 "NS ID"=4026531836
msg="Found event" component=httpfltr.Tracer "Host PID"=1276 "NS Pid"=1276 "NS ID"=4026531836
...
It seems our bpf code which is matching based on parent pid is what works.
Ah there's more to it, the bfp side sees it as the correct single process as it should be, it's always that host id. I think we are reading the host pid wrong in the task_pid function.
node-3418467 [008] d..31 248011.792338: bpf_trace_printk: Sending client buffer GET / HTTP/1.1
@grcevski
I also noticed one more issue at the time we run the process discovery loop. We record the active port numbers, but with client programs they will grow to large arrays because it seems we are capturing the ephemeral ports too. See how the list grows in the output:
Yeah, this will be handled in the next PR, when we handle the removal of processes after they stop existing. Also the closed ports will be removed.
Once Beyla instrumented an executable file, all the instances of that executable were instrumented, even if the user only selected one given process (e.g. by port).
This can be an importan issue if run multiple services in e.g. in python or node but only want to instrument one of them, and not all the services run by the
python
ornode
executable.This PR makes Beyla to account the PIDs of the processes that match the discovery selection criteria, and filters the traces from these processes that are not in that group.