Open rsmmr opened 6 years ago
Backends likely need to implement differently depending on whether they support (if we had a postgres one, that may support it, but didn't see anything for sqlite and didn't even look yet for rocksdb). For in-memory, we can do whatever, and for others, my first idea is to build up the necessary data structure within a value stored at some hidden/internal key and do address lookups against that.
A user-facing design question I'd have is whether address lookups in this manner would be done via some other method. Just re-using store.exists(ip)
can ambiguous to whether you are asking whether there is a key matching the given IP or whether the IP is contained within any subnet that was previously stored as a top-level key. I'd suggest adding a new store.contains(ip)
method to differentiate.
Bro's tables special-case tables of subnets for address lookups.
I understand this as a pure optimization. For example, if we could use a trie instead of a hash table for storing addresses as keys, we'd have significant memory savings as the table grows.
Backends likely need to implement differently depending on whether they support
I haven't looked into in depth at how a backend can handle different key types, but my hunch is that persisting these structures - at least at the per-key level, which Broker currently does - will not benefit from the same in-memory optimizations.
I'd suggest adding a new
store.contains(ip)
method to differentiate.
Both exists
and contains
are frequently used to test container membership. Since store
is the subject you're acting on, I would always associate this with a membership test.
Of course, you can document this distinction explicitly, but I would go a step further and use an even less ambiguous function name, e.g., something that contains prefix
in the name.
I understand this as a pure optimization. For example, if we could use a trie instead of a hash table for storing addresses as keys, we'd have significant memory savings as the table grows.
Yeah, should be just optimization we are talking about (one can currently test address membership by iterating O(n) over all subnets in a group).
I haven't looked into in depth at how a backend can handle different key types, but my hunch is that persisting these structures - at least at the per-key level, which Broker currently does - will not benefit from the same in-memory optimizations.
I'm also guessing at least some operations on a persistent backend are not-as-good or there will be a tradeoffs.
Just to clarify, the use-case does require supporting this operation per-key and for any arbitrary backend?
store.add_net("mynets", 192.168.0.0/16)
store.add_net("mynets", 10.0.0.0/8)
store.addr_in_nets("mynets", 192.168.1.1) # true
store.remove_net("mynets", 192.168.0.0/16)
store.addr_in_nets("mynets", 192.168.1.1) # false
For persisted backends, add/remove operations are always going to need to walk the full node list of whatever data structure is serialized. Maybe the membership check can still always operate on a supporting data structure that's maintained in memory even for backends that also persist it (avoids the cost of a full-node-walk incurred by deserialization/key-lookup).
Bro's tables special-case tables of subnets for address lookups. We should do the same for Broker's data stores so that they can support functionality that needs this, such as the intel framework. (Potential challenge: do the backends support network lookups?)