retis-org / retis

Tracing packets in the Linux networking stack & friends
https://retis.readthedocs.io/en/stable/
100 stars 14 forks source link

Python bindings #426

Closed amorenoz closed 1 month ago

amorenoz commented 2 months ago

Python bindings

This PR introduces python bindings and makes them available in two main ways:

In addition, a few refactorings and changes are included first. The most notable of them is adding support to read sorted events (produced by retis sort). This is also added to the python bindings.

Regarding the python bindings themselves, the main functional differences compared with the RFC are:

Python shell

The embedded shell exposes a global variable called "events" which is of type "EventReader". This class is capable of reading retis-generated files and iterate through its content.

It's not fully perfect but it's usable for quick prototyping.

Python library

The library is built using maturin and compatible with manylinux2014.

CI integration and (super-basic) unit tests (using tox and pytest) are added for the library. Also, the documentation has instructions to publish to pypi and install in a development environment.

Follow ups

amorenoz commented 2 months ago

Just updated the PR with a missing documentation file that I had forgotten to commit :-D. No changes in the code.

atenart commented 2 months ago

One thing that I find hard to work with is the event object is a dict, fields can be accessed like e['skb'] but then it's impossible to work with the SkbEvent fields directly (this was also the case in my initial version). This also prevents from having event types helpers after the event section level. The current workaround was to use the raw version (coming from JSON).

With a few additions we can actually access section fields (and below) directly as Python object; which helps with the UX to get them and allows to use per-type helpers:

e = next(events)

print(e['skb'].packet.len)

tcp = e['skb'].tcp
print(tcp.dport)
print(tcp.dport.service_name())  # We could do this for example.

I haven't added a __repr__ and show implementation for event_type so print(e['skb'].ip) and print(e['skb'].ip.show()) do not work. We should do this too. Also, there is at least one point not working as expected, such as accessing ip.version.V4.id (not sure why, haven't looked at it).

Patch:

diff --git a/retis-derive/src/lib.rs b/retis-derive/src/lib.rs
index 355dadf73ce0..5ec26a8c68c2 100644
--- a/retis-derive/src/lib.rs
+++ b/retis-derive/src/lib.rs
@@ -38,6 +38,19 @@ fn enum_is_simple(item: &Item) -> bool {
     }
 }

+fn has_named_fields(item: &Item) -> bool {
+    match &item {
+        Item::Struct(item) => match &item.fields {
+            Fields::Named(_) => true,
+            _ => false,
+        },
+        Item::Enum(item) => {
+            item.variants.iter().all(|v| matches!(v.fields, Fields::Named(_)))
+        }
+        _ => false,
+    }
+}
+
 #[proc_macro_attribute]
 pub fn event_type(
     _args: proc_macro::TokenStream,
@@ -50,6 +63,8 @@ pub fn event_type(
     // using underlying integers: See https://pyo3.rs/main/doc/pyo3/attr.pyclass.html.
     let pyclass_args = if enum_is_simple(&parsed) {
         "(eq, eq_int)"
+    } else if has_named_fields(&parsed) {
+        "(get_all)"
     } else {
         ""
     };
diff --git a/retis-events/src/kernel.rs b/retis-events/src/kernel.rs
index 806fb5bb448e..01bd5b76de3c 100644
--- a/retis-events/src/kernel.rs
+++ b/retis-events/src/kernel.rs
@@ -58,3 +58,9 @@ impl EventFmt for StackTrace {
         }
     }
 }
+
+impl pyo3::ToPyObject for StackTrace {
+    fn to_object(&self, py: pyo3::Python<'_>) -> pyo3::PyObject {
+        pyo3::IntoPy::into_py(self.0.clone(), py)
+    }
+}
diff --git a/retis-events/src/net.rs b/retis-events/src/net.rs
index 6001b94829cd..bf5ac795fb2f 100644
--- a/retis-events/src/net.rs
+++ b/retis-events/src/net.rs
@@ -53,9 +53,17 @@ pub(crate) fn protocol_str(protocol: u8) -> Option<&'static str> {
 }

 /// Represents a raw packet. Stored internally as a `Vec<u8>`.
+// We don't use #[event_type] as we're implementing serde::Serialize and
+// serde::Deserialize manually.
 #[derive(Clone, Debug)]
 pub struct RawPacket(pub Vec<u8>);

+impl pyo3::ToPyObject for RawPacket {
+    fn to_object(&self, py: pyo3::Python<'_>) -> pyo3::PyObject {
+        pyo3::IntoPy::into_py(self.0.clone(), py)
+    }
+}
+
 impl serde::Serialize for RawPacket {
     fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
     where
-- 
2.46.1
amorenoz commented 1 month ago

One thing that I find hard to work with is the event object is a dict, fields can be accessed like e['skb'] but then it's impossible to work with the SkbEvent fields directly (this was also the case in my initial version). This also prevents from having event types helpers after the event section level. The current workaround was to use the raw version (coming from JSON).

With a few additions we can actually access section fields (and below) directly as Python object; which helps with the UX to get them and allows to use per-type helpers:

e = next(events)

print(e['skb'].packet.len)

tcp = e['skb'].tcp
print(tcp.dport)
print(tcp.dport.service_name())  # We could do this for example.

I haven't added a __repr__ and show implementation for event_type so print(e['skb'].ip) and print(e['skb'].ip.show()) do not work. We should do this too. Also, there is at least one point not working as expected, such as accessing ip.version.V4.id (not sure why, haven't looked at it).

I tried something similar at the beginning and found many quirks so I left it as a follow up improvement. I can revisit now that things look a bit more stable.

Thanks

atenart commented 1 month ago

One thing that I find hard to work with is the event object is a dict, fields can be accessed like e['skb'] but then it's impossible to work with the SkbEvent fields directly (this was also the case in my initial version). This also prevents from having event types helpers after the event section level. The current workaround was to use the raw version (coming from JSON). With a few additions we can actually access section fields (and below) directly as Python object; which helps with the UX to get them and allows to use per-type helpers:

e = next(events)

print(e['skb'].packet.len)

tcp = e['skb'].tcp
print(tcp.dport)
print(tcp.dport.service_name())  # We could do this for example.

I haven't added a __repr__ and show implementation for event_type so print(e['skb'].ip) and print(e['skb'].ip.show()) do not work. We should do this too. Also, there is at least one point not working as expected, such as accessing ip.version.V4.id (not sure why, haven't looked at it).

I tried something similar at the beginning and found many quirks so I left it as a follow up improvement. I can revisit now that things look a bit more stable.

Doing this as a follow up is fine, I was just happy to see there's a way forward. Up to you then.

amorenoz commented 1 month ago

One thing that I find hard to work with is the event object is a dict, fields can be accessed like e['skb'] but then it's impossible to work with the SkbEvent fields directly (this was also the case in my initial version). This also prevents from having event types helpers after the event section level. The current workaround was to use the raw version (coming from JSON). With a few additions we can actually access section fields (and below) directly as Python object; which helps with the UX to get them and allows to use per-type helpers:

e = next(events)

print(e['skb'].packet.len)

tcp = e['skb'].tcp
print(tcp.dport)
print(tcp.dport.service_name())  # We could do this for example.

I haven't added a __repr__ and show implementation for event_type so print(e['skb'].ip) and print(e['skb'].ip.show()) do not work. We should do this too. Also, there is at least one point not working as expected, such as accessing ip.version.V4.id (not sure why, haven't looked at it).

I tried something similar at the beginning and found many quirks so I left it as a follow up improvement. I can revisit now that things look a bit more stable.

Now I remember

One thing that I find hard to work with is the event object is a dict, fields can be accessed like e['skb'] but then it's impossible to work with the SkbEvent fields directly (this was also the case in my initial version). This also prevents from having event types helpers after the event section level. The current workaround was to use the raw version (coming from JSON). With a few additions we can actually access section fields (and below) directly as Python object; which helps with the UX to get them and allows to use per-type helpers:

e = next(events)

print(e['skb'].packet.len)

tcp = e['skb'].tcp
print(tcp.dport)
print(tcp.dport.service_name())  # We could do this for example.

I haven't added a __repr__ and show implementation for event_type so print(e['skb'].ip) and print(e['skb'].ip.show()) do not work. We should do this too. Also, there is at least one point not working as expected, such as accessing ip.version.V4.id (not sure why, haven't looked at it).

I tried something similar at the beginning and found many quirks so I left it as a follow up improvement. I can revisit now that things look a bit more stable.

Doing this as a follow up is fine, I was just happy to see there's a way forward. Up to you then.

Now I remember, the problem is enums, e.g: for ovs:

In [3]: e["ovs"]                                                    
Out[3]: OvsEvent { event: Some(Upcall(UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 })) }
In [4]: e["ovs"].event                                              
Out[4]: Upcall(UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 })                                                                         

In [5]: e["ovs"].event.cmd                                          
---------------------------------------------------------------------------                                                                                                                                                                                                       
AttributeError                            Traceback (most recent call last)                                                                                                                                                                                                       
Cell In[5], line 1                                                  
----> 1 e["ovs"].event.cmd                                          

AttributeError: 'builtins.OvsEventType_Upcall' object has no attribute 'cmd'                                                                                                                                                                                                      

In [6]: e["ovs"].event._0                                           
Out[6]: UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 }                                                                                 

In [7]: e["ovs"].event._0.cmd                                       
Out[7]: 1                                     
amorenoz commented 1 month ago

I haven't added a __repr__ and show implementation for event_type so print(e['skb'].ip) and print(e['skb'].ip.show()) do not work. We should do this too. Also, there is at least one point not working as expected, such as accessing ip.version.V4.id (not sure why, haven't looked at it).

I tried something similar at the beginning and found many quirks so I left it as a follow up improvement. I can revisit now that things look a bit more stable.

Doing this as a follow up is fine, I was just happy to see there's a way forward. Up to you then.

Now I remember, the problem is enums, e.g: for ovs:

In [3]: e["ovs"]                                                    
Out[3]: OvsEvent { event: Some(Upcall(UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 })) }
In [4]: e["ovs"].event                                              
Out[4]: Upcall(UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 })                                                                         

In [5]: e["ovs"].event.cmd                                          
---------------------------------------------------------------------------                                                                                                                                                                                                       
AttributeError                            Traceback (most recent call last)                                                                                                                                                                                                       
Cell In[5], line 1                                                  
----> 1 e["ovs"].event.cmd                                          

AttributeError: 'builtins.OvsEventType_Upcall' object has no attribute 'cmd'                                                                                                                                                                                                      

In [6]: e["ovs"].event._0                                           
Out[6]: UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 }                                                                                 

In [7]: e["ovs"].event._0.cmd                                       
Out[7]: 1                                     

I have found a workaround. We can "merge" the type into the enum (same for ip.V4) at the cost of loosing helpers (which we don't have right now):

diff --git a/retis-events/src/ovs.rs b/retis-events/src/ovs.rs
index ff061f66..90667e06 100644
--- a/retis-events/src/ovs.rs
+++ b/retis-events/src/ovs.rs
@@ -31,7 +31,18 @@ pub enum OvsEventType {
     /// Upcall event. It indicates the begining of an upcall. An upcall can have multiple enqueue
     /// events.
     #[serde(rename = "upcall")]
-    Upcall(UpcallEvent),
+    Upcall {
+        /// Upcall command. Holds OVS_PACKET_CMD:
+        ///   OVS_PACKET_CMD_UNSPEC   = 0
+        ///   OVS_PACKET_CMD_MISS     = 1
+        ///   OVS_PACKET_CMD_ACTION   = 2
+        ///   OVS_PACKET_CMD_EXECUTE  = 3
+        cmd: u8,
+        /// Upcall port.
+        port: u32,
+        /// Cpu ID
+        cpu: u32,
+    },
atenart commented 1 month ago

I have found a workaround. We can "merge" the type into the enum (same for ip.V4) at the cost of loosing helpers (which we don't have right now):

diff --git a/retis-events/src/ovs.rs b/retis-events/src/ovs.rs
index ff061f66..90667e06 100644
--- a/retis-events/src/ovs.rs
+++ b/retis-events/src/ovs.rs
@@ -31,7 +31,18 @@ pub enum OvsEventType {
     /// Upcall event. It indicates the begining of an upcall. An upcall can have multiple enqueue
     /// events.
     #[serde(rename = "upcall")]
-    Upcall(UpcallEvent),
+    Upcall {
+        /// Upcall command. Holds OVS_PACKET_CMD:
+        ///   OVS_PACKET_CMD_UNSPEC   = 0
+        ///   OVS_PACKET_CMD_MISS     = 1
+        ///   OVS_PACKET_CMD_ACTION   = 2
+        ///   OVS_PACKET_CMD_EXECUTE  = 3
+        cmd: u8,
+        /// Upcall port.
+        port: u32,
+        /// Cpu ID
+        cpu: u32,
+    },

Oh, did not realized this was the same limitation for ip.V4/6. This was a known limitation when started playing with pyo3 back then.

I'm not too much in favor of the above, it prevents from having a per-variant type and thus helpers. We should investigate more; eg. maybe we can do the get_all implementation ourselves for such enums using variant names as keys.

amorenoz commented 1 month ago

I have found a workaround. We can "merge" the type into the enum (same for ip.V4) at the cost of loosing helpers (which we don't have right now):

diff --git a/retis-events/src/ovs.rs b/retis-events/src/ovs.rs
index ff061f66..90667e06 100644
--- a/retis-events/src/ovs.rs
+++ b/retis-events/src/ovs.rs
@@ -31,7 +31,18 @@ pub enum OvsEventType {
     /// Upcall event. It indicates the begining of an upcall. An upcall can have multiple enqueue
     /// events.
     #[serde(rename = "upcall")]
-    Upcall(UpcallEvent),
+    Upcall {
+        /// Upcall command. Holds OVS_PACKET_CMD:
+        ///   OVS_PACKET_CMD_UNSPEC   = 0
+        ///   OVS_PACKET_CMD_MISS     = 1
+        ///   OVS_PACKET_CMD_ACTION   = 2
+        ///   OVS_PACKET_CMD_EXECUTE  = 3
+        cmd: u8,
+        /// Upcall port.
+        port: u32,
+        /// Cpu ID
+        cpu: u32,
+    },

Oh, did not realized this was the same limitation for ip.V4/6. This was a known limitation when started playing with pyo3 back then.

I'm not too much in favor of the above, it prevents from having a per-variant type and thus helpers. We should investigate more; eg. maybe we can do the get_all implementation ourselves for such enums using variant names as keys.

I know, it's not nice. I did run some experiments but failed to implement such manual getter. I need to dig a bit digger but, given in theory V4(SkbIpv4Event) is a tuple (i.e: we could have V4(SkbIpv4Event, SomeOtherThing), each element is named _0, _1, etc. This might change in the future and there might be a better way to access tuple fields. Those attributes seem not available from the rust side...

I agree, if we need helpers, we cannot just remove the type. However, in such case we could have this:

diff --git a/retis-events/src/ovs.rs b/retis-events/src/ovs.rs
index ff061f66..0546f9e2 100644
--- a/retis-events/src/ovs.rs
+++ b/retis-events/src/ovs.rs
@@ -31,7 +31,10 @@ pub enum OvsEventType {
     /// Upcall event. It indicates the begining of an upcall. An upcall can have multiple enqueue
     /// events.
     #[serde(rename = "upcall")]
-    Upcall(UpcallEvent),
+    Upcall {
+        #[serde(flatten)]
+        upcall: UpcallEvent
+     },

This workaround: 1) keeps the field "named" 2) keeps an explicit struct that can have helpers 3) serde flattens the json representation removing the extra step

The result (with an unchanged json file):

>>> e
{'skb': {}, 'kernel': {'probe_type': 'raw_tracepoint', 'symbol': 'openvswitch:ovs_dp_upcall'}, 'ovs': {'event_type': 'upcall', 'cpu': 1, 'port': 3366920467, 'cmd': 1}, 'common': {'timestamp': 23868955262984, 'task': {'pid': 1360, 'comm': 'napi/phy0-8197', 'tgid': 1360}, 'smp_id': 2}, 'skb-tracking': {'skb': 18446616546107689472, 'timestamp': 23868955262984, 'orig_head': 18446616576100907520}}
>>> e["ovs"]
OvsEvent { event: Some(Upcall { upcall: UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 } }) }
>>> e["ovs"].event
Upcall { upcall: UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 } }
>>> e["ovs"].event.upcall
UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 }
>>> e["ovs"].event.upcall.cmd
1

Both options (removing the struct and adding an extra field) might be enough for now, WDYT?

atenart commented 1 month ago
diff --git a/retis-events/src/ovs.rs b/retis-events/src/ovs.rs
index ff061f66..0546f9e2 100644
--- a/retis-events/src/ovs.rs
+++ b/retis-events/src/ovs.rs
@@ -31,7 +31,10 @@ pub enum OvsEventType {
     /// Upcall event. It indicates the begining of an upcall. An upcall can have multiple enqueue
     /// events.
     #[serde(rename = "upcall")]
-    Upcall(UpcallEvent),
+    Upcall {
+        #[serde(flatten)]
+        upcall: UpcallEvent
+     },

This workaround:

1. keeps the field "named"

2. keeps an explicit struct that can have helpers

3. serde flattens the json representation removing the extra step

The result (with an unchanged json file):

>>> e
{'skb': {}, 'kernel': {'probe_type': 'raw_tracepoint', 'symbol': 'openvswitch:ovs_dp_upcall'}, 'ovs': {'event_type': 'upcall', 'cpu': 1, 'port': 3366920467, 'cmd': 1}, 'common': {'timestamp': 23868955262984, 'task': {'pid': 1360, 'comm': 'napi/phy0-8197', 'tgid': 1360}, 'smp_id': 2}, 'skb-tracking': {'skb': 18446616546107689472, 'timestamp': 23868955262984, 'orig_head': 18446616576100907520}}
>>> e["ovs"]
OvsEvent { event: Some(Upcall { upcall: UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 } }) }
>>> e["ovs"].event
Upcall { upcall: UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 } }
>>> e["ovs"].event.upcall
UpcallEvent { cmd: 1, port: 3366920467, cpu: 1 }
>>> e["ovs"].event.upcall.cmd
1

Both options (removing the struct and adding an extra field) might be enough for now, WDYT?

The above is quite nice as a workaround IMO. The only downside is having to handle this when working with the event in the Rust side, but that's manageable and something we can easily change later if needed.

amorenoz commented 1 month ago

The latest version has:

amorenoz commented 1 month ago

Thanks for the reworks!

Could we improve the following error msg by calling file_type()? Or even just print an already sorted series (sorting something sorted should just return the input)?

$ retis sort retis.sorted 
Error: Incorrect filetype. Use next_series().

Ack. I'll write something that can serve both purposes.

Or maybe just improving the underlying error message, it is currently developer-centered but could be user-centered. I guess this was designed with Python use in mind, but maybe such things should be done in the Python part only?

For this specific case, how about:

$ retis sort retis_sorted.data 
File already sorted
atenart commented 1 month ago

For this specific case, how about:

$ retis sort retis_sorted.data 
File already sorted

Sounds good, thanks!

amorenoz commented 1 month ago

@atenart FYI It turns out pyo3 requires python >= 3.7 which means it doesn't work with centos-8 :-(. So we do need a non-python build after all.

atenart commented 1 month ago

@atenart FYI It turns out pyo3 requires python >= 3.7 which means it doesn't work with centos-8 :-(. So we do need a non-python build after all.

And as stable distro gets older pyo3 as chances to start being incompatible at some point (not only w/ c8s). Being able to build w/o Python is a fix, but removing support when is was previously available is a bummer. Not that I have a solution, I guess that's more a packaging issue than an upstream one.

Also thanks for trying to make all of this work, rebasing and making the CI to pass looks like a long journey :(