zeek / zeek

Zeek is a powerful network analysis framework that is much different from the typical IDS you may know.
https://www.zeek.org
Other
6.34k stars 1.2k forks source link

Make connection IDs configurable #284

Open rsmmr opened 5 years ago

rsmmr commented 5 years ago

5-tuples aren't always the best way to define what a connection is. I believe it wouldn't be too difficult to allow redefining connection IDs to use a different set of fields, for example to include the VLAN ID as well.

devbali commented 5 years ago

So do we do a bitwise configuration of what a Connection ID record should contain? What options should be open to the users?

rsmmr commented 5 years ago

Not sure yet how the configuration would look like, but generally we'd offer a pre-defined set of fields to users that they can choose from. The default choice would remain the current conn_id.

rsmmr commented 4 years ago

What's the state here?

grigorescu commented 4 years ago

I think selecting the fields makes sense, but it might also be necessary to provide a mechanism for a protocol analyzer to "close" a connection. In this way, additional traffic seen with the same fields should generate a new conn uid, new instances of the analyzer(s), etc.

Even with the same fields, different protocols might have state-based semantics around "connection" management.

duffy-ocraven commented 4 years ago

In the absence of any countervaling indications from other real-world examples, then when we're freshly looking at conn_id, not constrained to be the 5-tuple, Zeek's concept of conn_id could be well guided by taking the BACnet Broadcast scenario into account.

The BVLC constructions are industry-proven to be sufficiently-robust, and they are widely-implemented according to a sufficiently precise specification, with repeated reinforcement by feedback from the independent interoperability labs to ever more effectively guide successful interoperability amongst the myriad development shops that each device manufacturer has independently spun-up. In BACnet the three-way came into conception because it was deemed appropriate for a small number of devices to implement more complex broadcast propagation, and for the majority of devices to need only elaborate their implementations in regard to unicast and broadcast with just a few scenarios.

The three-way is characterized by a packet called Distribute-Broadcast-to-Network (D-B-t-N). Distribute-Broadcast-to-Network ::= IPv4 header, with 0x17 UDP as the protocol, ending as always with IPv4 ip.orig-h IPv4 ip.resp_h, then UDP ip.orig_p ip.resp_p UDP Length as uint16 UDP Checksum as uint16 0x81 as uint8 # the BACnet service BVLC value 0x9 as uint8 BACnet NPDU, always starting with the NPCI control flags as uint8

and Forwarded-NPDU ::= IPv4 header, with 0x17 UPD as the protocol, ending as always with IPv4 ip.orig-h IPv4 ip.resp_h, then UDP ip.orig_p ip.resp_p UDP Length as uint16 UDP Checksum as uint16 0x81 as uint8 # the BACnet service BVLC value 0x4 as uint8 value of "Originating_Device", a field that is composed of IPv4 ip.orig_h from the D-B-t-N as uint32 IPv4 ip.orig_p from the D-B-t-N as uint16 BACnet NPDU, always starting with the NPCI control flags as uint8

The Originating_Device field is carried in every Forwarded-NPDU. That is what should influence conn_id and community_id, rather than the id.orig_h and id.orig_p from the Forwarded-NPDU, which is semantically irrelevant. The more complex BACnet-Broadcast-Management-Devices (BBMDs) execute the Distribute-Broadcast-to-Network, and initiate multiple unicast and/or broadcast Forwarded-NPDU packets; providing in the Originating_Device field in each the id.orig_h and id.orig_p that were seen in the incoming Distribute-Broadcast-to-Network messages which occasioned the sending out of multiple Forwarded-NPDU packets as a result.

rsmmr commented 4 years ago

Good input, thanks. I'm wondering about the "broadcast" part of this: Zeek isn't good with 1-to-many communication, even at the IP level broadcast/multicast is problematic because it breaks the assumption of having a session between two specific endpoints. How's that looking from the BACnet perspective?

duffy-ocraven commented 4 years ago

The "broadcast" part is problematic on real-world networks, for BACnet as for any protocol. The BACnet industry is migrating towards using the unicast versions of what were initially always-broadcast Who-Is / I-Am dynamic binding operations, so the volume of broadcast is going down. A remaining use case though is the initial prospective binding operation is still broadcast, to find "anyone" (typically the nearest BACnet router) who can bootstrap the inter-network discovery process.

The consideration for conn_id, however, is that the paradigm of "broadcasts go via BBMD", even that initial one, produces a situation where the intended respondent--usually there is just one--responds via that same BBMD, having Network-route information taken from the NPDU that arrived via Forwarded-NPDU. The ultimate response packet in what goes back is also a Forwarded-NPDU. So though the first transmission each direction is often a local IP broadcast (which is probably on a different segment and never seen by the Zeek monitor), the semantic exchange is merely pair-wise and traveling in mostly unicast Forwarded-NPDU, potentially in a broadcast Forwarded-NPDU on the final local IP segment, and occasionally has a unicast D-B-t-N as the first step either direction. But except for the need to pick up the Originating_Device from three potentially different packet layouts (final Forwarded-NPDU sent unicast, final local Forwarded-NPDU sent broadcast, and unicast Forwarded-NPDU sent between intermediary BBMDs), the formation of conn_id doesn't have to ever think about the 1-to-N nature of what could have happened--but doesn't happen. Only one node actually answers I-Am, because the device_identifier assignments are mandated to be internetwork unique.

duffy-ocraven commented 4 years ago

So in summary, BACnet uses broadcast not to get a lot of responses, but rather to say "who knows the address for this guy?"

awelzel commented 7 months ago

Chatting with @vpax who had a a IPv4 only use case here. He "hacked" this problem by encoding additional information into an IPv6 address (not sure if it was vlan-id or some other piece of info from the L2 layer).

A pretty popular use-case is using the vlan-id for flow/connection tracking. For example, Suricata has this enabled by default since about 10 years.

# This option controls the use of VLAN ids in the flow (and defrag)
# hashing. Normally this should be enabled, but in some (broken)
# setups where both sides of a flow are not tagged with the same VLAN
# tag, we can ignore the VLAN id's in the flow hashing.
vlan:
  use-for-tracking: true

https://github.com/OISF/suricata/blob/f7114b7fe38861f1dc618586158617f9b1c14ddd/suricata.yaml.in#L1434-L1439

I wonder if we should split the vlan-id use-case into a separate ticket specifically as a starting point. May provide sufficient value for various deployments and better compatibility with co-hosted Suricata deployments. Making connection IDs fully configurable via plugins/scripts generically could be kept in mind, but that might require a new plugin hook or some such.

awelzel commented 6 months ago

Had a chat with Smoot (cc @stevesmoot - this is the ticket). Their use-case is broader than vlan-id.

Another thought here could be to look at this from a tunnel perspective: Tag addr / IPAddr with a generic tunnel id (count?) and extend tunnels (possibly just internally) to cover vlan-id and make this also plugin configurable.

It hasn't been really mentioned, but the implications on the script layer are pretty major, too: What is the correct/expected behavior of a table[addr] of count with IPv4 addresses with the same 4 octets, but live in separate vlans/tunnels. If addr isn't vlan/tunnel aware, counting connections per addr would likely produce unexpected results as they're collapsed.