On Sigma Placeholders - Githubissues

newodahs commented 2 years ago

I've spent some time thinking about Placeholders (https://github.com/SigmaHQ/sigma/wiki/Specification#placeholders) in Sigma recently and decided to draft up an approach to it for use in the engine.

Normally, placeholders with Sigma would be a semi-static (or at least static at the time of expansion) list that is expanded when building the search marcros/filters, however I believe we have some additional challenges:

Placeholders may be constantly changing (add/remove) in a dynamic list
We have less knowledge of how to pull the placeholders together as caller storage may vary

Basically, you may run across Selection rules that are like: Username: %Administrators% where %Administrators% is suppose to expand into a list of all usernames that are administrator accounts for matching purposes.

With this, I've drafted an approach for us you may see here: https://github.com/markuskont/go-sigma-rule-engine/compare/master...newodahs:placeholder (while fully functional, this is a starting point for discussion more than anything).

This placeholder concept only really applies to standalone strings (regular ContentPattern rules) in Selections and NOT regexes, keywords, or globs, so the scope of impact is limited from that perspective. NOTE: As we implement string matching through a StringMatcher interface with no real type assertions to know what we're processing at the time of matching, it was easier to just add the lookup across the board (and ignore it in all but the regular ContentPattern matcher).

I also had to add code to detect the placeholders at parse/compile time of the rules so we could flag them to know when to use a placeholder or not during match time.

I didn't want to have to parse/recompile rules as placeholders changed and I wanted to avoid break an external interface, so where I landed was an extension to the Matcher interface (MatchEx) and some light refactoring to give a choice to the caller:

They can call Match as they do now with no placeholder support (they're treated just as strings right now)
- This is should be non-breaking for those currently using the engine
They can call MatcEx instead and pass a placeholder lookup function for us to use to expand for look ups as we match on those particular rules

Note: I did also implement extended Eval and EvalAll functions as well (EvalEx and EvalAllEx, respectively).

The nice thing about this approach is it will allow the caller some more flexibility in the lookup function as it’s passed when they call match rather than as a configuration item and it won’t break their current code, which would have happened if we altered the current event interfaces.

The changes also does refactor the Match/Eval calls a bit, where the bulk of the code is now in the Ex version of those functions and the base existing functions simply call the Ex versions with a nil lookup.

I'm still kicking around this idea and I'm not sure I'm super fond of the loop logic in ContentPattern to match the list I've added, though I think it's probably OK given the limited frequency of which we'll run into it.

TLDR; Attempting to add placeholder functionality that does not break existing code but give a reasonable balance between implementation complexity and expansion.

markuskont commented 2 years ago

Writing next to morning coffee, so excuse me if I'm missing something.

How would this approach be better than just generating a OR Matcher during rule parse time whenever we find a placeholder? A placeholder for %Administrators% sounds like a Username: admin1 OR Username: admin2 OR Username: admin3... where each individual item functions as regular string lookup / matcher. We simply expand the OR gate from a configurable list of %Administrators% while parsing the rules whereas that list is defined as config option to the constructor.

Sorry if I miss something important. Sure you spend more time looking into the sigma specification. But I feel like this is something that should fit into existing logic quite neatly. That's why the Matcher was defined as interface, that we can abstract any kind of logic behind it.

newodahs commented 2 years ago

No worries; that's a fair point to approach though we'd have to build the OR-matcher frequently, which is fine if we think building the matcher and passing through it will be just as fast or faster.

These may not be (likely aren't) static lists; for example, in the use case I'm working with we have changing lists of systems (adds and removes as BYOD and other systems come online and change profile-type as we learn more about them) over a period of time where we're not restarting the engine and though the changes aren't super frequent, it's enough where it matters that we have an updated list with each pass.

newodahs commented 2 years ago

Your comments on using the Or Matcher sparked an idea while I was on my run today: a hybrid approach to allow both the initial setting from file (useful upfront and where values don't change often) and an update function to allow re-building the placeholders when needed without rebuilding the entire engine.

We could create a placeholder implementation that sits at the RuleSet level, along side of Rules. This would basically be a wrapper around a map of k = placeholder name, v = Or Matcher (which is made up of a bunch of ContentMatchers). So we can read these placeholders in as configuration at startup or, by placing a lock around it (rwlock, probably), update it on the fly by just locking, rebuilding the corresponding or matcher, and moving on with our lives.

On the rule parsing side, as we setup our rules and run across placeholders, we can build out a new matcher that knows to up to the global placeholder for an Or Matcher at process time (sing it's placeholder value as the key) and pops through the Match call as normal; if the key is not found we could revert back to trying to just match on the raw string directly (which is what we do today).

This also gives us the benefit of sharing placeholders very easily across multiple rules in the engine (where multiple rules reference the same placeholder, they all point to the same Or matcher in memory rather than each having a distinct one).

I think this may be an intelligent reuse and maybe that's where you were going; it does eliminate all of the MatchEx and callback function stuff I originally designed (though TBH, wasn't super happy with parts of it) but I think this is a better solution as it balances the reuse of existing and stored compilation of the lists with the ability to reload/reset individual placeholders without having to tear down the entire engine or reload the entire list.

To reload, you would just pass a key and []string, lock the placeholder list for writing, and rebuild the Or Matcher for that key from the passed strings. Or, if you need to be more heavy handed, just tell it to re-read its file and rebuild all of them.

I hope that makes sense, I'm still a bit dehydrated from my 97 degree run today...

markuskont commented 2 years ago

Sorry for absence, last month was busy both on and offline. Heat did not help. I wanted to properly focus on the idea and also to play around with it myself (which I will try and do now).

I think that hybrid approach would be good. That's exactly what I meant, as I see the placeholders as ruleset level construct. So the application should decide when to reload the values. Or perhaps we could plug in a reload goroutine down the line to handle this seamlessly.

From what I'd gather here's what we need:

a special OR object with reload method and locking mechanism;
a ruleset walk function that would traverse all rules in the tree, type switch over them, and call the reload if the type is placeholder;

I think reload should simply accept a loaded map of placeholders rather than do any disk IO itself. That should also pave the way for the reference idea. Not sure how to best handle locks though. I already added rwmutex to the ruleset level a while back. Not sure if it's better to just lock the whole thing while updating or to do it per rule object.

markuskont commented 2 years ago

Upon closer inspection, I believe this is actually a extension of the Selection object. That's because it operates on concrete type, rather than more abstract Event. More specifically, it needs to be a new atomic String matcher in pattern.go. I guess that pattern implementing StringMatcher could just hold a pointer to placeholder object which could be locked and updated by the reload routine instead.

newodahs commented 2 years ago

I'll give this some more thought and try to get back to it soon - similar to you I've gotten unexpectedly busy recently but I haven't forgotten about this.

markuskont commented 2 years ago

In the meanwhile, I did some (very preliminary coding). It's not much and totally not tested, but at least it shows what direction my thinking took - https://github.com/markuskont/go-sigma-rule-engine/compare/master...next-placeholders-2022-07

I simply added a Placeholder flag to SelectionStringItem and set up a locked handler for data loading and string matcher construction. Live reload should simply be a matter of walking the rule tree, checking if object is selection matcher, and overriding the value with newly constructed matcher list in case the placeholder flag is set.

This walk could be hooked into goroutine tick that reloads the placeholders yaml.

markuskont / go-sigma-rule-engine

On Sigma Placeholders #22