influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.7k stars 5.59k forks source link

`inputs.netflow`: Source MAC address and Destination MAC address are not being included in metrics when using sFlow v5 decoder #15918

Closed joseluisgonzalezca closed 1 month ago

joseluisgonzalezca commented 1 month ago

Relevant telegraf.conf

[[inputs.netflow]]
  service_address = "udp://:2055"
  protocol = "sflow v5"

  [inputs.netflow.tags]
    sonda = "HOU1141546"

[[outputs.file]]
  files = ["stdout", "/var/tmp/metrics.out"]
  data_format = "json"

Logs from Telegraf

Here I include multiple metrics, where all fields are present except from `in_src_mac` and `out_dst_mac`:

{"fields":{"agent_ip":"192.168.227.2","agent_subid":0,"datalink_frame_type":"IPv4","direction":"ingress","dst":"192.168.100.223","fragment_flags":"......D.","fragment_offset":0,"in_snmp":33,"in_total_packets":3633943287,"ip_total_len":52,"ip_version":4,"ipv4_id":56762,"ipv4_inet_header_len":5,"ipv4_total_len":52,"l2_bytes":70,"l2_protocol":"ETHERNET-ISO8023","out_snmp":27,"protocol":"tcp","sampling_drops":21513257,"sampling_interval":200,"seq_number":18576472,"src":"192.168.100.221","src_tos":0,"sys_uptime":2042522488,"tcp_ack_number":3443859435,"tcp_flags":"...A....","tcp_seq_number":1885360104,"tcp_urgent_ptr":0,"tcp_window_size":12299,"ttl":64,"vlan_dst":100,"vlan_dst_priority":0,"vlan_src":100,"vlan_src_priority":0},"name":"netflow","tags":{"host":"40f2df22e0c0","sonda":"HOU1141546","source":"::1","version":"sFlowV5"},"timestamp":1726823867}
{"fields":{"agent_ip":"192.168.227.2","agent_subid":0,"datalink_frame_type":"IPv4","direction":"ingress","dst":"192.168.110.88","fragment_flags":"......D.","fragment_offset":0,"in_snmp":99,"in_total_packets":2941125196,"ip_total_len":52,"ip_version":4,"ipv4_id":9276,"ipv4_inet_header_len":5,"ipv4_total_len":52,"l2_bytes":74,"l2_protocol":"ETHERNET-ISO8023","out_snmp":97,"protocol":"tcp","sampling_drops":3271,"sampling_interval":200,"seq_number":5469211,"src":"172.31.32.233","src_tos":0,"sys_uptime":2042522488,"tcp_ack_number":3797455467,"tcp_flags":".......F","tcp_seq_number":996475973,"tcp_urgent_ptr":0,"tcp_window_size":506,"ttl":57,"vlan_dst":20,"vlan_dst_priority":0,"vlan_src":20,"vlan_src_priority":0},"name":"netflow","tags":{"host":"40f2df22e0c0","sonda":"HOU1141546","source":"::1","version":"sFlowV5"},"timestamp":1726823867}
{"fields":{"agent_ip":"192.168.227.2","agent_subid":0,"datalink_frame_type":"IPv4","direction":"egress","dst":"192.168.100.104","fragment_flags":"......D.","fragment_offset":0,"in_snmp":97,"in_total_packets":2135660404,"ip_total_len":33,"ip_version":4,"ipv4_id":0,"ipv4_inet_header_len":5,"ipv4_total_len":53,"l2_bytes":75,"l2_protocol":"ETHERNET-ISO8023","out_snmp":97,"protocol":"udp","sampling_drops":9287278,"sampling_interval":200,"seq_number":19009839,"src":"142.250.200.99","src_tos":0,"sys_uptime":2042522488,"ttl":50,"vlan_dst":100,"vlan_dst_priority":0,"vlan_src":100,"vlan_src_priority":4294967295},"name":"netflow","tags":{"host":"40f2df22e0c0","sonda":"HOU1141546","source":"::1","version":"sFlowV5"},"timestamp":1726823867}
{"fields":{"agent_ip":"192.168.227.2","agent_subid":0,"datalink_frame_type":"IPv4","direction":"ingress","dst":"192.168.100.221","fragment_flags":"......D.","fragment_offset":0,"in_snmp":27,"in_total_packets":3633958918,"ip_total_len":1500,"ip_version":4,"ipv4_id":34151,"ipv4_inet_header_len":5,"ipv4_total_len":1500,"l2_bytes":1518,"l2_protocol":"ETHERNET-ISO8023","out_snmp":33,"protocol":"tcp","sampling_drops":21405750,"sampling_interval":200,"seq_number":18721933,"src":"192.168.100.223","src_tos":0,"sys_uptime":2042522488,"tcp_ack_number":1885360632,"tcp_flags":"...A....","tcp_seq_number":3443873179,"tcp_urgent_ptr":0,"tcp_window_size":514,"ttl":64,"vlan_dst":100,"vlan_dst_priority":0,"vlan_src":100,"vlan_src_priority":0},"name":"netflow","tags":{"host":"40f2df22e0c0","sonda":"HOU1141546","source":"::1","version":"sFlowV5"},"timestamp":1726823867}
{"fields":{"agent_ip":"192.168.227.2","agent_subid":0,"datalink_frame_type":"IPv4","direction":"ingress","dst":"142.250.185.14","fragment_flags":"......D.","fragment_offset":0,"in_snmp":97,"in_total_packets":2135660471,"ip_total_len":1258,"ip_version":4,"ipv4_id":40458,"ipv4_inet_header_len":5,"ipv4_total_len":1278,"l2_bytes":1300,"l2_protocol":"ETHERNET-ISO8023","out_snmp":6,"protocol":"udp","sampling_drops":9287278,"sampling_interval":200,"seq_number":19009840,"src":"192.168.153.106","src_tos":0,"sys_uptime":2042522488,"ttl":128,"vlan_dst":153,"vlan_dst_priority":0,"vlan_src":153,"vlan_src_priority":0},"name":"netflow","tags":{"host":"40f2df22e0c0","sonda":"HOU1141546","source":"::1","version":"sFlowV5"},"timestamp":1726823867}
{"fields":{"agent_ip":"192.168.227.2","agent_subid":0,"datalink_frame_type":"IPv4","direction":"ingress","dst":"192.168.100.111","fragment_flags":"......D.","fragment_offset":0,"in_snmp":101,"in_total_packets":840274998,"ip_total_len":33,"ip_version":4,"ipv4_id":0,"ipv4_inet_header_len":5,"ipv4_total_len":53,"l2_bytes":71,"l2_protocol":"ETHERNET-ISO8023","out_snmp":60,"protocol":"udp","sampling_drops":14100530,"sampling_interval":200,"seq_number":20640785,"src":"142.250.184.14","src_tos":0,"sys_uptime":2042522488,"ttl":51,"vlan_dst":250,"vlan_dst_priority":0,"vlan_src":250,"vlan_src_priority":0},"name":"netflow","tags":{"host":"40f2df22e0c0","sonda":"HOU1141546","source":"::1","version":"sFlowV5"},"timestamp":1726823867}
{"fields":{"agent_ip":"192.168.227.2","agent_subid":0,"datalink_frame_type":"IPv4","direction":"ingress","dst":"8.8.8.8","fragment_flags":"......D.","fragment_offset":0,"in_snmp":30,"in_total_packets":90672359,"ip_total_len":32,"ip_version":4,"ipv4_id":32519,"ipv4_inet_header_len":5,"ipv4_total_len":52,"l2_bytes":70,"l2_protocol":"ETHERNET-ISO8023","out_snmp":2,"protocol":"udp","sampling_drops":2026918,"sampling_interval":200,"seq_number":4569837,"src":"192.168.100.231","src_tos":0,"sys_uptime":2042522488,"ttl":64,"vlan_dst":100,"vlan_dst_priority":0,"vlan_src":100,"vlan_src_priority":0},"name":"netflow","tags":{"host":"40f2df22e0c0","sonda":"HOU1141546","source":"::1","version":"sFlowV5"},"timestamp":1726823867}

System info

Telegraf v1.32.0 running on Docker, Debian 12 as base OS

Docker

Docker compose for testing environment:

services:
  telegraf:
    image: telegraf:1.32.0
    container_name: telegraf
    ports:
      - 2055:2055/udp
    restart: unless-stopped
    volumes:
    - ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
    - ./sflow-packet.bin:/tmp/sflow-packet.bin:ro

Steps to reproduce

I have captured some sFlow traffic using tcpdump. You can use this trace as reference: telegraf-sflow.pcap.zip

Because I'm using default Netflow port (2055/UDP) for sFlow traffic, Wireshark may not be able to dissect it. You can tune Wireshark analyzer to decode traffic as sFlow: image

I have taken one of this packets and copied its content directly from Wireshark to a binary file. The final step requires to run Telegraf locally (or with Docker) with the provided configuration and send the sFlow message stored in the file by using Netcat:

sflow-packet.bin.zip

jlgonzalez@Joses-MacBook-Air telegraf-tests % docker exec -it telegraf bash
root@40f2df22e0c0:/# cat /tmp/sflow-packet.bin > /dev/udp/localhost/2055
root@40f2df22e0c0:/# 

Expected behavior

Source MAC address and destination MAC address must be present in Telegraf metrics.

Actual behavior

Fields are being decoded by goflow2 dissector but are not being properly included in Telegraf Metric struct.

Additional info

I have made some tests with a dummy function to try to identity the underlying problem. I'm certain that the problem comes from the type of the decoded SrcMAC and DstMAC variables. If you execute this code in local, you can see that the type is net.HardwareAddr:

package main

import (
    "bytes"
    "fmt"
    "os"

    "github.com/gopacket/gopacket"
    "github.com/gopacket/gopacket/layers"
    "github.com/netsampler/goflow2/v2/decoders/sflow"
)

func main() {
    payload := // file content convert to byte slice
    buf := bytes.NewBuffer(payload)

    var msg sflow.Packet
    sflow.DecodeMessageVersion(buf, &msg)
    for _, s := range msg.Samples {
        if flowSample, ok := s.(sflow.FlowSample); ok {
            for _, r := range flowSample.Records {
                if sampledHeader, ok := r.Data.(sflow.SampledHeader); ok {
                    if sampledHeader.Protocol == 1 {
                        packet := gopacket.NewPacket(sampledHeader.HeaderData, layers.LayerTypeEthernet, gopacket.Default)
                        for _, layer := range packet.Layers() {
                            if ethernetLayer, ok := layer.(*layers.Ethernet); ok {
                                fmt.Println(ethernetLayer.SrcMAC)
                                fmt.Printf("%T\n", ethernetLayer.SrcMAC)
                                fmt.Println(ethernetLayer.DstMAC)
                                fmt.Printf("%T\n", ethernetLayer.DstMAC)
                                os.Exit(0)
                            }
                        }
                    }
                }
            }
        }
    }
}

However, when the fields are included in the metric, Telegraf checks that the type is a known one (check convertField function which is called when a new Telegraf metric is created):

https://github.com/influxdata/telegraf/blob/640eda0ca699a97704602076116c520ec5f425a0/metric/metric.go#L55

I think the fix is quite straightforward. It's only necessary to modify the lines where MAC addresses are included to the fields map and convert them to string by using String() function:

https://github.com/influxdata/telegraf/blob/640eda0ca699a97704602076116c520ec5f425a0/plugins/inputs/netflow/sflow_v5.go#L372 https://github.com/influxdata/telegraf/blob/640eda0ca699a97704602076116c520ec5f425a0/plugins/inputs/netflow/sflow_v5.go#L373

Hope this helps. Thanks for your work!

srebhan commented 1 month ago

Thanks @joseluisgonzalezca for your report. Will look into this as soon as time permits.

borjam commented 1 month ago

Just a quick note. In order to avoid confusion with Netflow, please check whether there are "source MAC address" and "destination MAC address" fields without any in or out reference.

The "in_src_mac", "out_src_mac", "in_dst_mac" and "out_dst_mac" Netflow fields make sense when dealing with Netflow (IP) traffic going through a router but not when dealing with Ethernet frames.

srebhan commented 1 month ago

@borjam please test the binary in PR #16009, available as soon as CI finished the tests, and let me know if this fixes the issue.

joseluisgonzalezca commented 1 month ago

I have run a little test using the new binaries and the sFlow traffic example that I provided. Everything is working as expected. Thank you for making the fix in such short time.

This issue can be closed if the fix is merged to the main branch.

srebhan commented 1 month ago

Thanks for testing the PR so quickly @joseluisgonzalezca! The issue will automatically be closed as soon as the PR is merged...

joseluisgonzalezca commented 3 weeks ago

I have a found similar issue with src_port and dst_port for the TCP layer. Variables have layers.TCPPort type but they should be converted to uint16 to be properly added to metrics:

https://github.com/influxdata/telegraf/blob/master/plugins/inputs/netflow/sflow_v5.go#L414

https://github.com/influxdata/telegraf/blob/master/plugins/inputs/netflow/sflow_v5.go#L415

I see that UDP case is already being covered.

srebhan commented 3 weeks ago

@joseluisgonzalezca could you please open a new issue for that so we can keep track of it? Mention me there and I will take a look.