Open vidister opened 2 years ago
That's odd - if you query some raw data, does it have any FlowDirection
values at all?
kubectl exec -it chi-netmeta-netmeta-0-0-0 -c clickhouse -- clickhouse-client <<< 'select * from flows_raw limit 10 format JSONEachRow'
Okay, this is a funny one: We have two distinct (or kinda related) problems here.
On the IPFIX Setup I get flow entries like this:
{
"Date": "2021-10-25",
"FlowType": "FLOWUNKNOWN",
"SequenceNum": "3014",
"TimeReceived": "1635174093",
"SamplingRate": "2000",
"FlowDirection": 255,
"SamplerAddress": "REDACTED",
"TimeFlowStart": "1635174074",
"TimeFlowEnd": "1635174074",
"Bytes": "52",
"Packets": "1",
"SrcAddr": "REDACTED",
"DstAddr": "REDACTED",
"EType": 2048,
"Proto": 6,
"SrcPort": REDACTED,
"DstPort": REDACTED,
"InIf": 528,
"OutIf": 552,
"SrcMac": "0",
"DstMac": "0",
"SrcVlan": 101,
"DstVlan": 0,
"VlanId": 101,
"IngressVrfId": 0,
"EgressVrfId": 0,
"IPTos": 0,
"ForwardingStatus": 0,
"IPTTL": 55,
"TCPFlags": 16,
"IcmpType": 0,
"IcmpCode": 0,
"IPv6FlowLabel": 0,
"FragmentId": 0,
"FragmentOffset": 0,
"BiFlowDirection": 0,
"SrcAS": REDACTED,
"DstAS": REDACTED,
"NextHop": "REDACTED",
"NextHopAS": 0,
"SrcNet": 15,
"DstNet": 24
}
So, "FlowDirection": 255,
on every entry. (WHERE FlowDirection != 255
returns 0 rows).
There's this Blog post describing the issue:
They export 255 to avoid reporting the wrong flow direction when a packet is sampled by both ingress and egress PFE
https://www.plixer.com/blog/juniper-mx240-ipfix-support-direction-problems/
So an easy solution would be to match FlowDirection != 1
instead of FlowDirection == 0
and vice versa.
Then there's a second setup using sflow on JunOS. There are only entries with "FlowDirection": 0
:
$ kubectl exec -it chi-netmeta-netmeta-0-0-0 -c clickhouse -- clickhouse-client <<< 'select count(*) from flows_raw WHERE FlowDirection == 0 limit 10 format JSONEachRow'
{"count()":"463967"}
$ kubectl exec -it chi-netmeta-netmeta-0-0-0 -c clickhouse -- clickhouse-client <<< 'select count(*) from flows_raw WHERE FlowDirection != 0 limit 10 format JSONEachRow'
{"count()":"0"}
It definitively is sampling both directions, so I don't know why it sets the FlowDirection to 0. I'll grab some pcaps and try to figure out what's going on there.
But even then it should display something on the SrcASN Graph, right? Well, just Reserved-ASN 0. This is because SrcAS and DstAS are both set to 0. So I guess we have to check if the value is zero and perform another lookup in the risinfo dict.
sFlow Dump:
{
"Date": "2021-10-22",
"FlowType": "FLOWUNKNOWN",
"SequenceNum": "51378",
"TimeReceived": "1634864823",
"SamplingRate": "4000",
"FlowDirection": 0,
"SamplerAddress": "REDACTED",
"TimeFlowStart": "1634864823",
"TimeFlowEnd": "1634864823",
"Bytes": "1498",
"Packets": "1",
"SrcAddr": "REDACTED",
"DstAddr": "REDACTED",
"EType": 2048,
"Proto": 17,
"SrcPort": REDACTED,
"DstPort": REDACTED,
"InIf": 542,
"OutIf": 508,
"SrcMac": "REDACTED",
"DstMac": "REDACTED",
"SrcVlan": 10,
"DstVlan": 1,
"VlanId": 0,
"IngressVrfId": 0,
"EgressVrfId": 0,
"IPTos": 0,
"ForwardingStatus": 0,
"IPTTL": 63,
"TCPFlags": 0,
"IcmpType": 0,
"IcmpCode": 0,
"IPv6FlowLabel": 0,
"FragmentId": 9561,
"FragmentOffset": 0,
"BiFlowDirection": 0,
"SrcAS": 0,
"DstAS": 0,
"NextHop": "::",
"NextHopAS": 0,
"SrcNet": 0,
"DstNet": 0
}
Thanks for debugging!
So an easy solution would be to match FlowDirection != 1 instead of FlowDirection == 0 and vice versa.
Happy to implement this - if I understood the linked article correctly, this would result in correct-ish data by counting all ingress/egress traffic in both tables, right?
I suppose we could also implement #60 and use interface IDs to figure out the flow direction instead, which would also solve the problem with sFlow where no FlowDirection is included.
We already do something similar for the other graphs: if(FlowDirection == 1, 'out', 'in') AS FlowDirection
But even then it should display something on the SrcASN Graph, right? Well, just Reserved-ASN 0.
It definitely should do exactly that, this is what it looks like on one of my sFlow samplers:
The solution here is to fill in ASN data using the risinfo dict at capture time to have proper historic data - this is already on the short-term backlog and shouldn't be hard to do.
Happy to implement this - if I understood the linked article correctly, this would result in correct-ish data by counting all ingress/egress traffic in both tables, right?
this is correct.
We already do something similar for the other graphs: if(FlowDirection == 1, 'out', 'in') AS FlowDirection
Yes, in that case every flow is labeled as "in" right now, which can be a bit confusing. What are we doing with these? We could put a third string "Unknown" in there, but I think that could also be confusing in the dashboard. Maybe leaving the string empty could do the job? Still a bit messy when mixed sources with differently broken flow export implementations are involved, but probably as good as it gets...
Sounds to me like the "correct" solution is to fix up the data at ingestion time. Your data seems to have correct InIf
and OutIf
data, so presumably one could deduce the correct flow direction from that?
But how do you know if a interface is a edge-port or some internal/backbone interface? Set a flag in the interfaceMap?
Yup, that was the idea - just have a map of all interfaces and which way they're facing, possibly determined from Netbox and/or SNMP. Does that sound workable?
Not nice, but probably the best solution.
Yeah... I don't think we can avoid it unless the device tells us the physical flow direction, which it doesn't want to...
How about having a list of "local" CIDR ranges and using that to determine direction? AS won't work but IPs might.
This would work fine for Hosting-Provider-Like networks but not so well for ISP networks with downstream ASNs (like ours.). I think fastnetmon solves this by receiving the routes via BGP... which seems a bit overkill here.
Hmm...we could do that! No half measures 😆
How would that look like - use BGP/BMP to figure out local networks?
Yup. BGP/BMP integration could become useful anyways. We could work with BGP communities.
Okay, let's do that - sounds like the "correct" solution.
(was meaning to have BMP support anyway to get AS path and avoid the risinfo trick)
We could work with BGP communities.
i.e. have a config setting which communities mark "local" networks?
i.e. have a config setting which communities mark "local" networks?
yes. And we can generalize this to use other communities as well for filtering.. For example customer networks, region/city communities, etc.
The ASN Lookup is now fixed on the development branch by #87 / #88 . Can the issue then be closed or do we still have the FlowDirection issue?
This is now implemented, thanks again :)
The FlowDirection issue is still open :/
The issue with the FlowDirection is partly happening on portmirror deployments too, because the flow data that gets ingested only has one InIf/OutIf set. Because of this, all Graphs have the sum of all traffic from the other direction displayed. We should probably find a way to infer it ourselves since it is the broader solution to issues like this
In our setups with exports from Junos based routers (using both sflow and netflow) the default dashboard doesn't display any ASN statistics. When I remove the
AND FlowDirection =
conditions from the queries everything works fine. Maybe our routers don't add FlowDirection Attributes to the flowsamples? Is it possible to remove the condition?