influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.11k stars 5.51k forks source link

Adding support of SFlow drop packets #15375

Open akarneliuk opened 1 month ago

akarneliuk commented 1 month ago

Use Case

Hey team,

I'm evaluating usage of SFlow to collect data from internet devices, where BGP information is crucial. Sflow v5 supports this information per their specification: https://sflow.org/SFLOW-STRUCTS5.txt

/* Extended Gateway Data */
/* opaque = flow_data; enterprise = 0; format = 1003 */

struct extended_gateway {
   next_hop nexthop;           /* Address of the border router that should
                                  be used for the destination network */
   unsigned int as;            /* Autonomous system number of router */
   unsigned int src_as;        /* Autonomous system number of source */
   unsigned int src_peer_as;   /* Autonomous system number of source peer */
   as_path_type dst_as_path<>; /* Autonomous system path to the destination */
   unsigned int communities<>; /* Communities associated with this route */
   unsigned int localpref;     /* LocalPref associated with this route */
}

This information is missing however in input.sflow plugin.

Expected behavior

Telegraf parses struct extended_gateway and this information is available within tags along with already parsed structs extended_router and extended_switch. SFlow plugin configuration

Actual behavior

This struct is currently not parsed and therefore information isn't available.

Additional info

No response

powersj commented 1 month ago

Hi,

Can you please look at using the inputs.netflow plugin instead. It does look to already support the extended gateway.

Thanks

akarneliuk commented 1 month ago

Hey @powersj ,

Thanks for prompt response. I did look into that plug-in, but sadly I was getting errors for every packet I was receiving; hence, I switched to sflow plug-in, which worked quite nicely apart from missing this struct.

What is the long-term goal in InfluxData? are you supporting both plug-ins or going to sunset one in a favour of another?

Thanks, Anton

powersj commented 1 month ago

sflow is deprecated in favor of netflow. If you are getting errors please do let us know and we can take a look.

akarneliuk commented 1 month ago

Hey @powersj ,

understood. So, here is my issue with netflow plug-in. Telegraf version: 1.30.2 Configuration

[[inputs.netflow]]
  service_address = "udp://:6343"
  protocol = "sflow v5"

When I send sFlow data to Telegraf, I see the following errors in the Telegarf log:

2024-05-18T09:08:13Z E! [inputs.netflow] Error in plugin: sFlow sample [[format:5 seq: 71837279] unknown format 5]; raw data ...

Thanks, Anton

powersj commented 1 month ago

format:5

Looking at the upstream goflow2 code, format 5 is not defined:

FORMAT_RAW_PKT = 1 FORMAT_ETH = 2 FORMAT_IPV4 = 3 FORMAT_IPV6 = 4

From slofw.go. Is this some sort of extended data? @srebhan thoughts?

srebhan commented 1 month ago

@powersj will look into it...

@akarneliuk could you please post the data after the raw data part of the log message so I can reproduce the issue locally!?

akarneliuk commented 1 month ago

Hey @srebhan ,

Thanks for looking into that one. I'm looking how I can anonymise the packet for compliance reasons. What I was able to detect by digging into pcap with sflowtool is that the packets causing problems are drop notifications: https://sflow.org/sflow_drops.txt

Which I believe raises an interesting difference in behaviour between sflow plugins and netflow plguins in Telegraf: sflow plugin ignnores things it cannot decode and decode the rest. The netflow throws an error. Perhaps, the later is more preferable (it would be nice though to have possibility (like flag or so ) to ignore problematic issues.

Going back to original issue, do you think you can look into implementing drop notifications? Also if you can DM me your mail so I can share anonymised pcap.

Best, Anton

srebhan commented 1 month ago

@akarneliuk it would be nice if you could send me an anonymized dump of such a packet in this issue so I can create an unit-test from it. Alternatively, you can drop the non-redacted dump into a personal message on Slack (@ Sven Rebhan)...

srebhan commented 1 month ago

@akarneliuk please test the binary in PR #15396, available as soon as CI finished tests, and let me know if this works for you!

akarneliuk commented 1 month ago

Hey @srebhan, I have tested, it worked nicely for me! Thank you so much for doing that quickly. I have another request for sflow as well, but i will open separate issue for that

srebhan commented 1 month ago

@akarneliuk please be aware that merging my required changes upstream (into the goflow2 library) might take some time, so do not expect this feature to land in v1.31.0 yet!

srebhan commented 5 days ago

@akarneliuk rebase the PR against the latest master to include additional fields for extended-gateway etc...