Restrict duplicate qualifiers per expression

varnerac commented 6 years ago

STIX 2.0 spec: 4.1.1 Observation Expression Qualifiers

Each Observation Expression MAY have additional temporal or repetition restrictions using the respectiveWITHIN, START/STOP, and REPEATS keywords.

I propose adding clarifying normative text:

An Observation Expression MUST not have more than one Qualifier of a particular type.

This prevents goofy, ambiguous patterns like:

[foo:bar NOT LIKE '\''] WITHIN 1 SECONDS WITHIN 2 SECONDS WITHIN 60 SECONDS

treyka commented 6 years ago

@varnerac Good catch! This is a mistake that seems obvious in hindsight. We should clarify the spec.

JasonKeirstead commented 6 years ago

We also stumbled upon this while implementing.

jmgnc commented 6 years ago

This example doesn't make sense, as it needs to be '([ a ] AND [ b ]) WITHIN x SECONDS'.

I don't see an need to restrict it.. yes, it might result in a pattern doing more work, but it also makes more work in defining properly restrictive normative text that does not prevent value uses. We need to ensure that ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS is valid and will work, and will not be prevented by the normative text proposed.

JasonKeirstead commented 6 years ago

@jmgnc The example indeed doesn't make sense, but it is valid as per the spec... that is the problem. You seem to be making the assumption that an observation expression + a qualifier makes a new, qualified observation expression.. the spec doesn't actually state this anywhere. If we did state this, then yes, it would solve the problem, in a roundabout way... but IMO this is overcomplicated and not useful, since it doesn't ever make any logical sense to have multiple identical qualifiers like this applied against a single observation expression.

jmgnc commented 6 years ago

You completely miss my point. We can make the spec more strict to eliminate ALL possible non-sensical patterns, but then we'll not be spending time on enhancing and solving real problems. All languages allows you to write non-sensical statements. Most(all?) programming languages allow you to write a = a, which is non-sensical, but I know of no language has restrictions that prevent you from writing that statement.

Please propose actual normative text that we can discuss. With out any proposal, I don't think we can move this discussion forward.

I have given an example of where multiple qualifiers of a particular type (within) are applied to a single observation expression, so we cannot restrict it in a blanket manner.

JasonKeirstead commented 6 years ago

I agree that all languages allow you to write nonsensical things. However, the problem with ours is how it affects grammar based parsers.

I think that the suggested change Drew has - which is one sentence - is very straight forward and solves this problem.

We could also solve the problem by adding text that defines how the expression you have above should behave, by describing that an observation expression + a qualifier makes a new, qualified observation expression. This text to me though seems more complicated... however, if you want to go that way, can you make a suggestion for the text? So we can compare?

jordan2175 commented 5 years ago

@JasonKeirstead has suggested text for this and will add it to the document. We will schedule this to be reviewed during an upcoming working call.

varnerac commented 5 years ago

There’s also text suggested in the issue itself.

JasonKeirstead commented 5 years ago

@varnerac I used your suggestion (with 1 word change addition) in the doc.

jmgnc commented 5 years ago

I just gave an example of @varnerac 's text being overly restrictive.

We need to ensure that ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS is valid and will work, and will not be prevented by the normative text proposed.

So, I reject the proposed text.

JasonKeirstead commented 5 years ago

@jmgnc See my Aug 15th comment... Can you explain how that above pattern is supposed to be interpreted? The spec does not explain it at all today. It's therefore totally undefined.

If it is a use case we need to support, then we need normative text to explain it. As of right now, your above example results in undefined behaviour.

jordan2175 commented 5 years ago

I agree with Jason here.

Thanks, Bret PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."

On Apr 4, 2019, at 1:33 PM, Jason Keirstead notifications@github.com wrote:

@jmgnc https://github.com/jmgnc See my Aug 15th comment... Can you explain how that above pattern is supposed to be interpreted? The spec does not explain it at all today. It's therefore totally undefined.

If it is a use case we need to support, then we need normative text to explain it. As of right now, your above example results in undefined behaviour.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oasis-tcs/cti-stix2/issues/70#issuecomment-480033498, or mute the thread https://github.com/notifications/unsubscribe-auth/AJk2GNrFhNhEVWaf6igw1aux-r6UJVyjks5vdlOPgaJpZM4SVowl.

jmgnc commented 5 years ago

This might be related to the fact that others simplified the text that we had originally because of reasons. The above is suppose to have an observed data that matches a and b be within x seconds, and there needs to be 5 cases of that pair matching withing x seconds, where all the data must have been observed within the same y second time period.

No additional parenthesis is needed because there qualifier applies to the previous expression.

Also, we have the text that allows this IN the table:

a MUST be an Observation Expression or a preceding Qualifier.

Each qualifier has that text.

JasonKeirstead commented 5 years ago

@jmgnc I covered all of that in my Aug 15th comment.... we still don't have any normative text to make this work. I don't know how to phrase it myself.

"We could also solve the problem by adding text that defines how the expression you have above should behave, by describing that an observation expression + a qualifier makes a new, qualified observation expression. This text to me though seems more complicated... however, if you want to go that way, can you make a suggestion for the text? So we can compare?"

Is this REALLY a use case anyone needs to support for any reason? Are there any toolchains that will support it? How would I make this query in say, Splunk....

jmgnc commented 5 years ago

I don't see why we need to add additional text. The proposal was to restrict it. The normative text already clearly defines what should happen when there are multiple qualifiers, and if you don't think it does, please be specific in where you think the specification isn't clear enough.

Take my example: ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS The rules for WITHIN are a WITHIN x SECONDS, so after the above, a is ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES and x is y.

Per the spec for WITHIN, it says: All Observations matched by a MUST occur, or have been observed, within the specified time window.

So that means that all of the observed data matched by ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES, must be within the specified seconds.

I can further break it down for you if you'd like.

Also, per the original topic: [foo:bar NOT LIKE '\''] WITHIN 1 SECONDS WITHIN 2 SECONDS WITHIN 60 SECONDS once you match the first one, the remaining ones will automatically match, so there isn't a lot of work done by this. Now if the spec was strictly limited to repeating qualifiers, that WITHIN cannot follow WITHIN w/o another qualifier in between, then I don't see a problem with that, but that is not how I read the proposal.

JasonKeirstead commented 5 years ago

As I said previously already... you seem to be assuming that there is text something like "observation expression + a qualifier is left-associative and makes a new, qualified observation expression". There is no such text in the spec. Nowhere do we say how to treat this scenario. We don't describe associativity of observation operators. As such, these examples you are giving, are entirely subject to interpretation. I could interpret them all totally differently and assume they are right-associative if I want.

Either we make the effort to properly describe the associativity of the operators and all of the scenarios it creates, or we say you can only use one. I prefer the latter because I don't think anyone is actually ever going to implement anything else in the real world.

jmgnc commented 5 years ago

Umm, we do define associativity of observation operations, look at the table in §4.1.2, it has a column labeled associativity. There's even an example in the tail end that talks about this that I wrote because of similar (but I believe different) confusion.

JasonKeirstead commented 5 years ago

We have another issue that is not actually resolved with any of the proposed solutions.

This is currently a valid pattern

( [ expr1 ] START A STOP B AND [ expr2 ] START C STOP D ) START X STOP Y

The spec does not explain how one is supposed to interpret this, beyond the below text, which is insufficient. It's left to the implementor

"Observation Expressions, along with their Observation Operators and optional Qualifiers, MAY be surrounded with parenthesis to delineate which Observation Expressions the Qualifiers apply to."

jmgnc commented 5 years ago

We have another issue that is not actually resolved with any of the proposed solutions.

This is currently a valid pattern

( [ expr1 ] START A STOP B AND [ expr2 ] START C STOP D ) START X STOP Y

The spec does not explain how one is supposed to interpret this, beyond the below text, which is insufficient. It's left to the implementor

"Observation Expressions, along with their Observation Operators and optional Qualifiers, MAY be surrounded with parenthesis to delineate which Observation Expressions the Qualifiers apply to."

I think this is enough to cover that:

All Observations that match a MUST have an observation time >= x and < y.

So if A&B or C&D are disjoint from X&Y, then it's obvious that there can be nothing that matches indicator, but it A&B and C&D overlap with X&Y, then it's valid. So I don't see a problem w/ your example.

P.S. I was never a fan of START/STOP, because it's a fixed point in time, and we already (now?) have valid_from/valid_until on the Indicator object.

JasonKeirstead commented 5 years ago

"So if A&B or C&D are disjoint from X&Y, then it's obvious that there can be nothing that matches indicator, but it A&B and C&D overlap with X&Y, then it's valid. So I don't see a problem w/ your example."

How do you come to that conclusion though - we don't specify the rules of how START / STOP behave when applied to START / STOP... is it intersection, union? We don't say.

jmgnc commented 5 years ago

I quoted you the part of the spec which you ignored:

All Observations that match a MUST have an observation time >= x and < y.

So in your example:

( [ expr1 ] START A STOP B AND [ expr2 ] START C STOP D ) START X STOP Y

all of expr1 must be between A&B per above. all of expr2 must be between C&D per above. both expr1 and expr2 must be between X&Y per above statement from the spec.

Notice that the wording is all observations, not one, or some of the observations, but ALL the ones that match the parenthesized expression.

jordan2175 commented 5 years ago

We talked about this on 2019-06-05 and the consensus on the call was to add this as a MUST NOT and Drew will provide text in the document.

jmgnc commented 5 years ago

I gave multiple examples that are well defined, and MUST be valid. As long as the text allows the valid examples to be given I am fine, but I will object to any changes that prevent the valid examples from being invalid patterns.

jordan2175 commented 5 years ago

Text was added in section 9.5.1

varnerac commented 5 years ago

Would this make things better? The more I read the spec the more that I think the issue here is with the spec and the grammar. The current grammar is recursive for observationExpression

observationExpression
  : LBRACK comparisonExpression RBRACK        # observationExpressionSimple
  | LPAREN observationExpressions RPAREN      # observationExpressionCompound
  | observationExpression startStopQualifier  # observationExpressionStartStop
  | observationExpression withinQualifier     # observationExpressionWithin
  | observationExpression repeatedQualifier   # observationExpressionRepeated
  ;

I think if we use the following, and replace observationExpression with observationExpression in higher levels, it could work. I haven't compiled this into code yet.

observationExpression
  : LBRACK comparisonExpression RBRACK
  | LPAREN observationExpressions RPAREN
  ;

qualifiedObservationExpression
  : observationExpression repeatedQualifier? startStopQualifier withinQualifier?
  | observationExpression repeatedQualifier? withinQualifier startStopQualifier?
  | observationExpression repeatedQualifier?
  ;

This means changing the text in the Patterning spec to match the grammar.

We could still have weird stuff like:

[foo OR bar] WITHIN 10 SECONDS

where the WITHIN 10 SECONDS is an automatic noop. But, I think this is better than what we have and leads to the functionality people want without allowing as many bizarre qualifiers that make no sense.

jordan2175 commented 5 years ago

The text was added to 9.5.1, based on additional review of Drew's ANTLR grammar above from John-Mark and Jason, we may add additional text or clean it up further.

jordan2175 commented 5 years ago

This is the text that was added.

An Observation Expression MUST NOT have more than one Qualifier of a particular type.

jmgnc commented 5 years ago

Would this make things better? The more I read the spec the more that I think the issue here is with the spec and the grammar. The current grammar is recursive for observationExpression
observationExpression
  : LBRACK comparisonExpression RBRACK        # observationExpressionSimple
  | LPAREN observationExpressions RPAREN      # observationExpressionCompound
  | observationExpression startStopQualifier  # observationExpressionStartStop
  | observationExpression withinQualifier     # observationExpressionWithin
  | observationExpression repeatedQualifier   # observationExpressionRepeated
  ;
I think if we use the following, and replace observationExpression with observationExpression in higher levels, it could work. I haven't compiled this into code yet.

I think you mean with qualifiedObservationExpression

observationExpression
  : LBRACK comparisonExpression RBRACK
  | LPAREN observationExpressions RPAREN
  ;

qualifiedObservationExpression
  : observationExpression repeatedQualifier? startStopQualifier withinQualifier?
  | observationExpression repeatedQualifier? withinQualifier startStopQualifier?
  | observationExpression repeatedQualifier?
  ;

The above still does not allow for my example above w/ two WITHIN statements.

([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS

If this text remains, it will force me to vote NO for this going to CS.

As I have stated before, it is not the standard's job to prevent people from writing noop and other bad patterns. We should not proscribe the smallest/optimal pattern to get the job done. That is up to the pattern matcher engine to optimize it if it wants.

varnerac commented 5 years ago

How does the following solution work for ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS

(([ a ] AND [ b ]) WITHIN x SECONDS AND ([ a ] AND [ b ]) REPEATS 5 TIMES WITHIN y SECONDS)

jmgnc commented 5 years ago

How about only restricting duplicate WITHIN and STARTSTOP when not around repeats? As the standard says, REPEATS is short hand for ([a ] and [a]). So yes, you could manual expand repeats to get around your restriction, but that defeats the point of having REPEATS.

We really have two different types of qualifiers. We have the generic substitution qualifier, REPEATS that can easily be replaced in the AST w/ a bunch of ands. We then have temporal qualifiers that are WITHIN and STARTSTOP. So, I'd be fine w/ limiting repeating temporal qualifiers as long as it substitution qualifier don't count.

jmgnc commented 5 years ago

How does the following solution work for ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS
(([ a ] AND [ b ]) WITHIN x SECONDS AND ([ a ] AND [ b ]) REPEATS 5 TIMES WITHIN y SECONDS)

Those are not equivalent. The reason they are not is that the second set of a and b are not restricted to be within x.

In my original pattern, imagine I'm looking for a and b that happen close together, like two DNS lookups. But then I'm looking at a larger pattern where I see those part of dns lookups happen multiple times over a larger period of time.

varnerac commented 5 years ago

I agree that REPEATS is a different beast than the other two. I mentioned it in Slack.

Yeah, they're definitely different beasts. I came to the same conclusion in Slack.

So, the link above is me opining on the solution to qualifiers in STIX patterns. In my mind, REPEATED X TIMES is not really a qualifier like START/STOP and WITHIN. As such, REPEATED is really a modifier on observation expressions. It can be thought of as syntactic sugar for ANDs in the language. If jmg, jkeirstead or jordan want to look at it, that’d be great.

varnerac commented 5 years ago

The reason they are not is that the second set of a and b are not restricted to be within x.

But the first clause of the observationExpressionAnd handles that:

(
 ([ a ] AND [ b ]) WITHIN x SECONDS
  AND
 ([ a ] AND [ b ]) REPEATS 5 TIMES WITHIN y SECONDS
)

I feel like I am missing something here.

varnerac commented 5 years ago

Ok, I think I see what you are saying:

"Is there an instance of a and b that occur with x seconds of each other and repeat 5 times within y seconds during those x seconds?"

jmgnc commented 5 years ago

No, you need to expand out the repeats, I'll do it for 3 so it's not that long:

((([a] AND [b]) WITHIN x SECONDS) AND (([a] AND [b]) WITHIN x SECONDS) AND (([a] AND [b]) WITHIN x SECONDS)) WITHIN y SECONDS

That is applying the repeats rule in the standard, now to make things more clear, lets replace the a and b, w/ a, b, c, d, e and f, so they are unique.

Each pair of (a, b), (c, d), and (e, f), must be within x seconds of each other. And all of the a, b, c, d, e, f must be within y seconds of each other. This is simple when you expand out the repeats, and understand that each within clause is independent, and only applies to the observations that end up matching its respective expression.

So, as I said, I'm fine w/ disallowing duplicate within or start/stop qualifiers when apply to the same expression, the issue that is failed to see is that the REPEATS generates a NEW expression, and is not qualifying an existing expression.

Maybe we shouldn't call REPEATS a qualifier then? Be cause it doesn't qualify an expression, it really generates a new one.

varnerac commented 5 years ago

I think of REPEATS as a modifier.

JasonKeirstead commented 5 years ago

Can someone point to a legitimate real world use case of

((([a] AND [b]) WITHIN x SECONDS) AND (([a] AND [b]) WITHIN x SECONDS) AND (([a] AND [b]) WITHIN x SECONDS)) WITHIN y SECONDS

I have been doing this for a very long time, and have never seen anyone write anything like this.

If no one is ever going to do it why are we arguing about it.

jmgnc commented 5 years ago

@JasonKeirstead by that logic, why don't we just get ride of REPEATS all together, because someone can just repeat out the clause that many times? There are lots of operators we could get rid of using that reductionist logic.

I see that as allowing my example as a simplified way of writing patterns that can be understood.

JasonKeirstead commented 5 years ago

Because people use REPEATS-like syntax in SIEMs all the time, it is commonplace.

jordan2175 commented 5 years ago

In my opinion we need to target the 70% use case. If we need to relax things later on, we can. We can even issue "errata" to relax something if it becomes urgent. I really worry about trying to provide a solution for every conceivable corner case. This is the problem that STIX 1 had. They tried to cover every possible solution. When we started STIX 2, we specifically said we were not going to to do that. Personally I would love it if in 6 months post STIX 2.1 CS we had 10 vendors yelling at us that we needed to add something or we forgot something. That would be awesome.

JasonKeirstead commented 5 years ago

Agree with @jordan2175 .

The real problem we have with STIX pattern is - outside of extremely basic IOC matching, almost no one is using it. There are other cybersecurity matching languages that are being widely adopted, and SCO Pattern is not.

Whats more, we just made SCO Pattern optional in 2.1, so its not even mandatory for indicators anymore, and could easily lose traction even in STIX.

Our primary concern at this point should be consumability and implementability, to spread adoption beyond simple IOC matching.

jmgnc commented 5 years ago

I agree that usage is a problem. Yet you'd rather spend time arguing against my valid use case above, than trying to adopt it. Restricting what you can do with patterning is not a good way to drive adoption.

I thought we agreed that we would come to a compromised solution on the working call, and I'm attempting to compromise, and now that a compromise is close, you're saying you don't want to compromise? Is that the gist of it @JasonKeirstead ?

This does not impact implementability, as it's already been implemented in the MITRE 2.0 patterning library. So I call BS on this argument.

varnerac commented 5 years ago

So, the REPEATS expansions to ANDs you show above is actually not allowed in the STIX 2.0 spec.

In 2.0, REPEATS is an Observation Expression Qualifier. It may modify an Observation Expression. Observation Expressions comprise one or more Comparison Expressions, joined via Boolean Operators. Observation Expressions do not include Observation Qualifiers. Patterns are the unit that groups Observations Expressions, Observation Operators, and Observation Qualifiers.

oasis-tcs / cti-stix2

Restrict duplicate qualifiers per expression #70