kentik / ktranslate

System for pulling and pushing network data.
Apache License 2.0
55 stars 25 forks source link

potential overflow of netflow messages length #549

Closed pyke369 closed 1 year ago

pyke369 commented 1 year ago

Because Kentik is enriching input flows with additional fields (such as source/destination asnum), and also using an extended template to accomodate v4/v6 addresses, there may be situations where the total raw IPFIX packet size is > 64k (65536 bytes), leading to an invalid (overflowed/rolled-over) lengh value in the packet header, b/c of the cumulative size of the new data flowsets:

I don't have a mitigation code at this point, but I could provide one if needed (the packer should be amended to generate multiple concatenated IPFIX messages, and the net/udp sink should loop to send a >64k buffer in multiple parts).

i3149 commented 1 year ago

Great find! if you can contribute a PR this would be awesome!

pyke369 commented 1 year ago

I'll give it a try. I was also thinking of exporting more stuff in the nf template (like at least the device-id, b/c right now you cannot associate a flow with the originating device that emitted it in the 1st place, that's a shame...). The info (and some other) is accessible in the received Cap'n'Proto kflow2 records.

i3149 commented 1 year ago

That's a great idea to add in all the info there is into the exported flow. This data should be present in each flow record sent to the Format function:

flows[0].CompanyId, flows[0].DeviceName, flows[0].DeviceId
pyke369 commented 1 year ago

I had a deeper look a the code and found there's already some way to limit the number of exported flows per batch/message, by simply setting the maxflowspermessage configuration entry (default 10000, which is way too high in the netflow case).

The current netflow template yields 120 bytes data flowsets, and if you add the template flowset itself (92 bytes) and the netflow header (16 bytes), the maximum flows per exported message should be:

int((65535 - 16 - 92) / 120) = 545

So let's say 540 to be on the safe side. Setting a decent default value when the output format is netflow:

$ git diff config.go
diff --git a/config.go b/config.go
@@ -13,8 +13,10 @@ const (
        // KentikAPITokenEnvVar is the environment variables used to get the Kentik API Token
        KentikAPITokenEnvVar = "KENTIK_API_TOKEN"
+       // MaxNetflowsPerMessage is the maximum flowsets in a single netflow message
+       MaxNetflowsPerMessage = 540
 )

@@ -494,6 +496,10 @@ func LoadConfig(configPath string) (*Config, error) {
                return nil, err
        }

+       if cfg.Format == "netflow" && (cfg.MaxFlowsPerMessage <= 0 || cfg.MaxFlowsPerMessage > MaxNetflowsPerMessage) {
+               cfg.MaxFlowsPerMessage = MaxNetflowsPerMessage
+       }
+
        return cfg, nil
 }

And voila, problem solved.

I will continue on adding the missing information you mentioned above (at least company-id and device-id, which may be fixed uint32 values).

pyke369 commented 1 year ago

I contributed this PR https://github.com/kentik/ktranslate/pull/553 for you to review (feel free to close this issue if needed). Thanks.

i3149 commented 1 year ago

LGMT -- merged in, thanks for the deep dive!