tenzir / public-roadmap

The public roadmap of Tenzir
https://docs.tenzir.com/roadmap
4 stars 0 forks source link

Contextualizer #140

Closed dominiklohmann closed 10 months ago

dominiklohmann commented 1 year ago

The matcher was at the forefront of the architectural design leading us to the pipeline-first Tenzir we have as of v4.0. It has since, however, fallen behind, and we must update it to fit nicely into our new architecture.

To do that, we have three things we need to consider:

In addition, we need to also consider how we want to handle retro matching in addition to live matching, which was previously handled by Threat Bus.

We should consider adding an expiry mechanism to indicators depending on the matcher backend.

### Definition of Done
- [x] Agree on desired capabilities
- [x] Design the new architecture
- [x] Implement the contaxtualizer plugin (closed-source)
- [x] Agree on initial context plugins (open-source)
dominiklohmann commented 12 months ago

This has been requested by a customer and also a likely prospect.

dominiklohmann commented 11 months ago

We've renamed matcher to Lookup Tables with the action the user performs being called lookup. We want to have different kinds of lookup tables, which must be pluggable. The lookup table infrastructure should be closed source, but lookup table plugins should mostly be open source.

The new naming aligns more closely with Splunk's lookup feature, and frees up match for pattern matching.

dominiklohmann commented 10 months ago

@mavam and I today agreed on the final name for this feature and the underlying terminology: Contextualizer.

We argued that effectively, we have two modes of working with contexts, whether that is Threat Intelligence or a Tensorflow model or something entirely different:

  1. We can filter events, replacing them with a context if available ("lookup"), or
  2. we can extend events with a context sub-record ("enrich").

We argued that (1) can always be implemented via (2), so we'll focus only on the enrichment use case for now.

We agreed on the following naming:

We want to support this through the following operators:

The arguments always depend on the context plugin. E.g., a YARA context is likely to just receive a directory during construction, but a table context needs the fields it contextualizes on passed to the enrich operator.

When the field for the enrichment context is not provided, we default to the name of the context.

To work around the problem that one plugin cannot define multiple operators, we want to implement enrich as an alias for the hidden context apply.

tobim commented 10 months ago

Since we did not fully discuss this DoD item yet:

Agree on initial context plugins (open-source)

We need the ability to do "matching" in a relatively short timeframe. The "simple" implementation of that would be a hash table context, and @Dakostu already implemented a plugin that does this against the previous interface generation. I believe it would still make sense to port the matcher implementation because that code also already exists and would probably enable higher throughput use-cases.

Another context implementation that came to my mind would be geoip via libmaxminddb.

@dominiklohmann how and when can we finalize this item?

tobim commented 10 months ago

I discussed the hash-table vs cuckoofilter choice with @mavam and we got to the following plan:

Implement the hash-table context first, then port the bloom filter implementation (with the DCSO backend) in a closed source context, and finally add a geoip context.