retis-org / retis

Tracing packets in the Linux networking stack & friends
https://retis.readthedocs.io/en/stable/
100 stars 14 forks source link

module: ct: add parent information in events #401

Closed atenart closed 4 months ago

atenart commented 5 months ago

This is not fully tested, any help welcomed. As I'm not familiar with nf_conn->master please review with that in mind.

A few comments:

From the commit log: """ Retrieve nf_conn->master information in addition to the base nf_conn. We reuse the existing logic to parse and report nf_conn->master.

This also moves non-nf_conn information out of the main section, and uses a dedicated one. It currently only contains the state.

Fixes #399. """

atenart commented 5 months ago

A few comments:

  • The event formatting is now like the following: ct_state <state> <base nf_conn> parent [<parent nf_conn>]. I did not make a difference in formatting (and data retrieving) between <base nf_conn> and <parent nf_conn>, but I'm not sure we want all the information to be printed (and retrieved) for both (eg. might make sense to show less information for <parent nf_conn>).

@igsilya could you help in testing this feature and commenting about the above (the formatting part)? You can build a custom binary from this branch or I can push a test image on quay.io.

igsilya commented 5 months ago

I'd say, the lines are a little too long:

450780703305121 (5) [python3] 272259 [tp] openvswitch:ovs_do_execute_action #199fb8fb0fc3bffff9b7dd1be6680 (skb ffff9b7dc6e7c6e8) n 0
  if 4980 (ovs-p1) rxif 4980 10.1.1.2.56869 > 10.1.1.9.32847 ttl 64 tos 0x0 id 13161 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1777698383 win 64240
  exec ct zone 1 nat
  + 450780703311419 (5) [python3] 272259 [tp] openvswitch:ovs_do_execute_action #199fb8fb0fc3bffff9b7dd1be6680 (skb ffff9b7dc6e7c6e8) n 1
    if 4980 (ovs-p1) rxif 4980 10.1.1.2.56869 > 10.1.1.9.32847 ttl 64 tos 0x0 id 13161 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1777698383 win 64240
    exec recirc 0x2
    ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.56869 > 10.1.1.9.32847] reply [10.1.1.1.32847 > 10.1.1.2.56869] zone 1 parent [tcp (ESTABLISHED) orig [10.1.1.1.45840 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.45840] zone 1]
  + 450780703317081 (5) [python3] 272259 [tp] openvswitch:ovs_dp_upcall #199fb8fb0fc3bffff9b7dd1be6680 (skb ffff9b7dc6e7c6e8) n 2
    if 4980 (ovs-p1) rxif 4980 10.1.1.2.56869 > 10.1.1.9.32847 ttl 64 tos 0x0 id 13161 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1777698383 win 64240
    upcall (miss) port 4078934847 cpu 5
    ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.56869 > 10.1.1.9.32847] reply [10.1.1.1.32847 > 10.1.1.2.56869] zone 1 parent [tcp (ESTABLISHED) orig [10.1.1.1.45840 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.45840] zone 1]
  + 450780703330066 (5) [python3] 272259 [kr] ovs_dp_upcall #199fb8fb0fc3bffff9b7dd1be6680 (skb ffff9b7dc6e7c6e8) n 3
    if 4980 (ovs-p1) rxif 4980 10.1.1.2.56869 > 10.1.1.9.32847 ttl 64 tos 0x0 id 13161 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1777698383 win 64240
    upcall_ret (5/450780703317081) ret 0
    ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.56869 > 10.1.1.9.32847] reply [10.1.1.1.32847 > 10.1.1.2.56869] zone 1 parent [tcp (ESTABLISHED) orig [10.1.1.1.45840 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.45840] zone 1]

Putting the parent on a separate line might be easier to read:

450780703305121 (5) [python3] 272259 [tp] openvswitch:ovs_do_execute_action #199fb8fb0fc3bffff9b7dd1be6680 (skb ffff9b7dc6e7c6e8) n 0
  if 4980 (ovs-p1) rxif 4980 10.1.1.2.56869 > 10.1.1.9.32847 ttl 64 tos 0x0 id 13161 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1777698383 win 64240
  exec ct zone 1 nat
  + 450780703311419 (5) [python3] 272259 [tp] openvswitch:ovs_do_execute_action #199fb8fb0fc3bffff9b7dd1be6680 (skb ffff9b7dc6e7c6e8) n 1
    if 4980 (ovs-p1) rxif 4980 10.1.1.2.56869 > 10.1.1.9.32847 ttl 64 tos 0x0 id 13161 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1777698383 win 64240
    exec recirc 0x2
    ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.56869 > 10.1.1.9.32847] reply [10.1.1.1.32847 > 10.1.1.2.56869] zone 1
             parent [tcp (ESTABLISHED) orig [10.1.1.1.45840 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.45840] zone 1]
  + 450780703317081 (5) [python3] 272259 [tp] openvswitch:ovs_dp_upcall #199fb8fb0fc3bffff9b7dd1be6680 (skb ffff9b7dc6e7c6e8) n 2
    if 4980 (ovs-p1) rxif 4980 10.1.1.2.56869 > 10.1.1.9.32847 ttl 64 tos 0x0 id 13161 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1777698383 win 64240
    upcall (miss) port 4078934847 cpu 5
    ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.56869 > 10.1.1.9.32847] reply [10.1.1.1.32847 > 10.1.1.2.56869] zone 1
             parent [tcp (ESTABLISHED) orig [10.1.1.1.45840 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.45840] zone 1]
  + 450780703330066 (5) [python3] 272259 [kr] ovs_dp_upcall #199fb8fb0fc3bffff9b7dd1be6680 (skb ffff9b7dc6e7c6e8) n 3
    if 4980 (ovs-p1) rxif 4980 10.1.1.2.56869 > 10.1.1.9.32847 ttl 64 tos 0x0 id 13161 off 0 [DF] len 60 proto TCP (6) flags [S] seq 1777698383 win 64240
    upcall_ret (5/450780703317081) ret 0
    ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.56869 > 10.1.1.9.32847] reply [10.1.1.1.32847 > 10.1.1.2.56869] zone 1
             parent [tcp (ESTABLISHED) orig [10.1.1.1.45840 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.45840] zone 1]
igsilya commented 5 months ago

The zone might not be necessary for a parent connection, since it has to be the same. However, if it is somehow not the same due to a kernel bug, then I'd love to know that. So, idk.

igsilya commented 5 months ago

Actually, with OVS we can commit the related connection to a different zone. Not sure why one would do that, but we can.

atenart commented 5 months ago

Thanks for testing! I agree the output is quite long with the parent information, however the current logic is to have a single line for every collector (eg. ct). Changing that might come in a later PR.