pixie-io / pixie

Instant Kubernetes-Native Application Observability
https://px.dev
Apache License 2.0
5.5k stars 425 forks source link

Associate AMQP Frames to a single transaction #647

Open nserrino opened 1 year ago

nserrino commented 1 year ago

Is your feature request related to a problem? Please describe. AMQP will have its transactions broken into multiple frames, each of which is traced as a record in Pixie's amqp_events table. I would like for there to be a field on these events that allows me to look at all of the frames for a particular transaction. In other words, associating the Content Header, Content Body, and other frames for a single publish/consume transactions.

Describe the solution you'd like A field with an ID on the event would be consumable by PxL

Describe alternatives you've considered I don't think we can really do a time-based association due to concurrent requests

Additional context This will allow us to visualize the bytes per second by task, since bytes and task name show up in different frames.

philomory commented 1 year ago

This would be huge for us; there's a lot of useful information in some of those content-header frames, but without knowing which frame method frame it belongs to, it's really hard to make use of any of it.

There's a couple of different ways I could see accomplishing this, based on the way the AMQP protocol works (although note that "transaction" is maybe not the best word choice, given that AMQP supports an actual "transaction" semantics similar to SQL transactions; what I think we're looking for here is more the ability to associate the multiple frames that comprise a single AMQP command (or method), such as basic.deliver, queue.declare, or tx.commit).

The most robust solution would be to have the protocol-handling code for AQMP construct a second table, amqp_commands, that groups the information for an entire command together; a command will consist of a "Method Frame", all frames sent in the same direction on the same TCP connection with the same channel id, up to the next "Method Frame" (usually this means, a "Method Frame", a "Content Header" frame, and then zero or more "Content Body" frames).

A somewhat lighter (but still maybe workable) solution would be to have the protocol-handling code track a "sequence number" for each channel that increments with each "Method Frame" sent over the channel, such that you could group together the frames of an individual command with df.groupby(['upid','channel','trace_role','sequence_number']).agg(...); unfortunately, without additional aggregation functions (or the ability to use groupby without an aggregation function), this method would have somewhat limited utility.

For users to implement this functionality entirely within their PxL scripts (without even "sequence numbering" from the protocol-handler), PxL would need several new functions, most notably the ability to use groupby without any subsequent aggregate, and the ability to group data frames by their position in a sequence, similar to the behavior of Ruby's Enumerable#slice_before method, or Python's split_before from more-itertools.

Ultimately while those sorts of additions to PxL might be worthwhile, on the whole they feel complicated enough that just having an amqp_commands table in addition to (or even instead of, honestly) the amqp_events table seems like a better solution.