Extend Broker data stores for subnet lookup

rsmmr commented 6 years ago

Bro's tables special-case tables of subnets for address lookups. We should do the same for Broker's data stores so that they can support functionality that needs this, such as the intel framework. (Potential challenge: do the backends support network lookups?)

jsiwek commented 6 years ago

Backends likely need to implement differently depending on whether they support (if we had a postgres one, that may support it, but didn't see anything for sqlite and didn't even look yet for rocksdb). For in-memory, we can do whatever, and for others, my first idea is to build up the necessary data structure within a value stored at some hidden/internal key and do address lookups against that.

A user-facing design question I'd have is whether address lookups in this manner would be done via some other method. Just re-using store.exists(ip) can ambiguous to whether you are asking whether there is a key matching the given IP or whether the IP is contained within any subnet that was previously stored as a top-level key. I'd suggest adding a new store.contains(ip) method to differentiate.

mavam commented 6 years ago

Bro's tables special-case tables of subnets for address lookups.

I understand this as a pure optimization. For example, if we could use a trie instead of a hash table for storing addresses as keys, we'd have significant memory savings as the table grows.

Backends likely need to implement differently depending on whether they support

I haven't looked into in depth at how a backend can handle different key types, but my hunch is that persisting these structures - at least at the per-key level, which Broker currently does - will not benefit from the same in-memory optimizations.

I'd suggest adding a new store.contains(ip) method to differentiate.

Both exists and contains are frequently used to test container membership. Since store is the subject you're acting on, I would always associate this with a membership test.

Of course, you can document this distinction explicitly, but I would go a step further and use an even less ambiguous function name, e.g., something that contains prefix in the name.

jsiwek commented 6 years ago

I understand this as a pure optimization. For example, if we could use a trie instead of a hash table for storing addresses as keys, we'd have significant memory savings as the table grows.

Yeah, should be just optimization we are talking about (one can currently test address membership by iterating O(n) over all subnets in a group).

I haven't looked into in depth at how a backend can handle different key types, but my hunch is that persisting these structures - at least at the per-key level, which Broker currently does - will not benefit from the same in-memory optimizations.

I'm also guessing at least some operations on a persistent backend are not-as-good or there will be a tradeoffs.

Just to clarify, the use-case does require supporting this operation per-key and for any arbitrary backend?

store.add_net("mynets", 192.168.0.0/16)
store.add_net("mynets", 10.0.0.0/8)
store.addr_in_nets("mynets", 192.168.1.1) # true
store.remove_net("mynets", 192.168.0.0/16)
store.addr_in_nets("mynets", 192.168.1.1) # false

For persisted backends, add/remove operations are always going to need to walk the full node list of whatever data structure is serialized. Maybe the membership check can still always operate on a supporting data structure that's maintained in memory even for backends that also persist it (avoids the cost of a full-node-walk incurred by deserialization/key-lookup).

zeek / broker

Extend Broker data stores for subnet lookup #16