phaag / nfdump

Netflow processing tools
Other
786 stars 206 forks source link

Feature Request: autonomous_system_organization name in nfdump output #430

Closed romanveaceslav closed 1 year ago

romanveaceslav commented 1 year ago

autonomous_system_organization name in nfdump output autonomous_system_organization name is often more valuable then country name or city name. Many organizations have multiple AS numbers spread over different continents and countries, therefor autonomous_system_organization name often come to be necessary as a first level of grouping. And, on an opposite side, adding different names to subnets (of my network) and integrating them as autonomous_system_organization name can give a benefit of propper segregation of summaries by these subnets. Ideally source and destination autonomous_system_organization name would be presented as individual fields, e.g. as formated fields %sasn and %dasn and supported by -A aggregation. Alternatively, at least, autonomous_system_organization name would be added to %sloc and %dloc in a fixed pozition (e.g bellow in a first pozition, fake data).

Src IP location info Dst IP location info "Lycatel Distribution Uk Limited"/EU/FR/France/48.8582/2.3387,STACKPATH-CDN/NA/US/"United States"/37.7510/-97.8220

P.S. The current implementation of %sloc and %dloc where there is long/lat (split by /) creates difficulties for further parsing, ideally they would be removed and only number kept.

phaag commented 1 year ago

The latest master implements the %sasn and %dasn tokens for printing. The implementation for -A is a bit more challenging.

romanveaceslav commented 1 year ago

Thank you, very much. You are really very, very fast. Honestly I thought it will take a couple of weeks.

I did few tests and it looks to not have any issues, I run over a big number of files (~300 mln records) with %sasn and %dasn and no failures. I did also run aggregation with -A srcip4/24, dstip4/24 and then grouped by %sas,%das,%sasn,%dasn, looks consistent with what I know about our network. I'll be far from laptop next week and I will be able to do more tests next weekend. But so far I am very pleased on how it works. Again, thank you, very much.

romanveaceslav commented 1 year ago

And I think you may already add %sasn, %dasn to the man page :)

romanveaceslav commented 1 year ago

Thanks very much again. Performed the test with real network data by comparing autonomous_system_organization produced for different addresses (~49 mln unique) by the nfdump and produced by another tool which match absolutely, except in 73 cases, which in my view, are corner cases. The summary of test bellow. Percentage in the brackets indicate the percentage of the coverage of autonomous_system_number and autonomous_system_organization present in the GeoLite dateabase version from Jan, 30, 2023. Unfortunately, the IPv6 traffic is not so big and only 31% of ASs are present.

IPv4_addresses IPv4_ASNUMS IPv4_ASNAMES 47377424 69132 (94%) 64543 (94%)

IPv6_addresses IPv6_ASNUMS IPv6_ASNAMES 12074613 9462 (31%) 8943 (31%)

Bad (not sure, do I miss something ?) cases are related to mapping of IPv6 addresses, not present in GeoLite2-ASN-Blocks-IPv6.csv, to some real ASs. Attached the whole list. 72 addresses are of the form ::xxxx.xxxx.xxxx.xxxx, e.g. nfdump maps ::6d6a:20e3:d400:3500 to AS num 213141 AS name "B. Braun Melsungen AG" which is not present at all in GeoLite2-ASN-Blocks-IPv6.csv. But is present in GeoLite2-ASN-Blocks-IPv4.csv and will be a perfect match if transform ::6d6a:20e3:d400:3500 -> ::d400:3500 = 212.0.53.0 which is in IPv4 AS num 213141 range 212.0.0.0/18. Checked all other similar cases, they give the same mapping as nfdump if set first 96 bits to 0.

1 case (which is just 1 packet found) is ::52.3.0.0 is shown by nfdump as as num 14618 and as name AMAZON-AES. It is from the, so called IPv4 compatible IPv6 addresses (but this is deprecated since 2006), and IPv4 52.3.0.0 really belongs to AMAZON-AES. Not sure to say whether this mapping is wrong, but, for sure is not present in GeoLite2-ASN-Blocks-IPv6.csv. And, geolookup does not recognize it as a valid address (but is valid). nfdump_not_matching.csv.zip

phaag commented 1 year ago

Fixed in master repo. Could you please check. The nfdump global handling gets improved at some point.

romanveaceslav commented 1 year ago

nfdump -V nfdump: Version: 1.7.1-771c0c9, Date: 2023-03-26 16:48:53 +0200

Perfect match. Thank you. Tested with the same data as above. No differences between nfdump and the mapper I used except ::52.3.0.0 which as before nfdump identify as AS num 14618 and AS name AMAZON-AES. After some thoughts I think this treatment from nfdump is correct and my tool doesn't treat it properly. At the end of the day such addresses are globaly routable and, even if deprecated, no one can say that some old devices still not implement it. Great results. Thank you.

One side issue is that geolookup still treat IPv6 in dotted notation as not valid:

geolookup -G mmdb.nf ::52.3.0.0 Not a valid IPv4 or IPv6:

geolookup -G mmdb.nf ::ffff:51.38.238.78 Not a valid IPv4 or IPv6:\

phaag commented 1 year ago

The IPv4 in IPv6 cases are now correctly handled.