Open varnerac opened 6 years ago
@varnerac Good catch! This is a mistake that seems obvious in hindsight. We should clarify the spec.
We also stumbled upon this while implementing.
This example doesn't make sense, as it needs to be '([ a ] AND [ b ]) WITHIN x SECONDS'.
I don't see an need to restrict it.. yes, it might result in a pattern doing more work, but it also makes more work in defining properly restrictive normative text that does not prevent value uses. We need to ensure that ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS
is valid and will work, and will not be prevented by the normative text proposed.
@jmgnc The example indeed doesn't make sense, but it is valid as per the spec... that is the problem. You seem to be making the assumption that an observation expression + a qualifier makes a new, qualified observation expression.. the spec doesn't actually state this anywhere. If we did state this, then yes, it would solve the problem, in a roundabout way... but IMO this is overcomplicated and not useful, since it doesn't ever make any logical sense to have multiple identical qualifiers like this applied against a single observation expression.
You completely miss my point. We can make the spec more strict to eliminate ALL possible non-sensical patterns, but then we'll not be spending time on enhancing and solving real problems. All languages allows you to write non-sensical statements. Most(all?) programming languages allow you to write a = a, which is non-sensical, but I know of no language has restrictions that prevent you from writing that statement.
Please propose actual normative text that we can discuss. With out any proposal, I don't think we can move this discussion forward.
I have given an example of where multiple qualifiers of a particular type (within) are applied to a single observation expression, so we cannot restrict it in a blanket manner.
I agree that all languages allow you to write nonsensical things. However, the problem with ours is how it affects grammar based parsers.
I think that the suggested change Drew has - which is one sentence - is very straight forward and solves this problem.
We could also solve the problem by adding text that defines how the expression you have above should behave, by describing that an observation expression + a qualifier makes a new, qualified observation expression. This text to me though seems more complicated... however, if you want to go that way, can you make a suggestion for the text? So we can compare?
@JasonKeirstead has suggested text for this and will add it to the document. We will schedule this to be reviewed during an upcoming working call.
There’s also text suggested in the issue itself.
@varnerac I used your suggestion (with 1 word change addition) in the doc.
I just gave an example of @varnerac 's text being overly restrictive.
We need to ensure that ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS is valid and will work, and will not be prevented by the normative text proposed.
So, I reject the proposed text.
@jmgnc See my Aug 15th comment... Can you explain how that above pattern is supposed to be interpreted? The spec does not explain it at all today. It's therefore totally undefined.
If it is a use case we need to support, then we need normative text to explain it. As of right now, your above example results in undefined behaviour.
I agree with Jason here.
Thanks, Bret PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."
On Apr 4, 2019, at 1:33 PM, Jason Keirstead notifications@github.com wrote:
@jmgnc https://github.com/jmgnc See my Aug 15th comment... Can you explain how that above pattern is supposed to be interpreted? The spec does not explain it at all today. It's therefore totally undefined.
If it is a use case we need to support, then we need normative text to explain it. As of right now, your above example results in undefined behaviour.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oasis-tcs/cti-stix2/issues/70#issuecomment-480033498, or mute the thread https://github.com/notifications/unsubscribe-auth/AJk2GNrFhNhEVWaf6igw1aux-r6UJVyjks5vdlOPgaJpZM4SVowl.
This might be related to the fact that others simplified the text that we had originally because of reasons. The above is suppose to have an observed data that matches a and b be within x seconds, and there needs to be 5 cases of that pair matching withing x seconds, where all the data must have been observed within the same y second time period.
No additional parenthesis is needed because there qualifier applies to the previous expression.
Also, we have the text that allows this IN the table:
a MUST be an Observation Expression or a preceding Qualifier.
Each qualifier has that text.
@jmgnc I covered all of that in my Aug 15th comment.... we still don't have any normative text to make this work. I don't know how to phrase it myself.
"We could also solve the problem by adding text that defines how the expression you have above should behave, by describing that an observation expression + a qualifier makes a new, qualified observation expression. This text to me though seems more complicated... however, if you want to go that way, can you make a suggestion for the text? So we can compare?"
Is this REALLY a use case anyone needs to support for any reason? Are there any toolchains that will support it? How would I make this query in say, Splunk....
I don't see why we need to add additional text. The proposal was to restrict it. The normative text already clearly defines what should happen when there are multiple qualifiers, and if you don't think it does, please be specific in where you think the specification isn't clear enough.
Take my example:
([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS
The rules for WITHIN are a WITHIN x SECONDS
, so after the above,
a is ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES
and x is y.
Per the spec for WITHIN, it says: All Observations matched by a MUST occur, or have been observed, within the specified time window.
So that means that all of the observed data matched by ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES
, must be within the specified seconds.
I can further break it down for you if you'd like.
Also, per the original topic:
[foo:bar NOT LIKE '\''] WITHIN 1 SECONDS WITHIN 2 SECONDS WITHIN 60 SECONDS
once you match the first one, the remaining ones will automatically match, so there isn't a lot of work done by this. Now if the spec was strictly limited to repeating qualifiers, that WITHIN cannot follow WITHIN w/o another qualifier in between, then I don't see a problem with that, but that is not how I read the proposal.
As I said previously already... you seem to be assuming that there is text something like "observation expression + a qualifier is left-associative and makes a new, qualified observation expression". There is no such text in the spec. Nowhere do we say how to treat this scenario. We don't describe associativity of observation operators. As such, these examples you are giving, are entirely subject to interpretation. I could interpret them all totally differently and assume they are right-associative if I want.
Either we make the effort to properly describe the associativity of the operators and all of the scenarios it creates, or we say you can only use one. I prefer the latter because I don't think anyone is actually ever going to implement anything else in the real world.
Umm, we do define associativity of observation operations, look at the table in §4.1.2, it has a column labeled associativity. There's even an example in the tail end that talks about this that I wrote because of similar (but I believe different) confusion.
We have another issue that is not actually resolved with any of the proposed solutions.
This is currently a valid pattern
( [ expr1 ] START A STOP B AND [ expr2 ] START C STOP D ) START X STOP Y
The spec does not explain how one is supposed to interpret this, beyond the below text, which is insufficient. It's left to the implementor
"Observation Expressions, along with their Observation Operators and optional Qualifiers, MAY be surrounded with parenthesis to delineate which Observation Expressions the Qualifiers apply to."
We have another issue that is not actually resolved with any of the proposed solutions.
This is currently a valid pattern
( [ expr1 ] START A STOP B AND [ expr2 ] START C STOP D ) START X STOP Y
The spec does not explain how one is supposed to interpret this, beyond the below text, which is insufficient. It's left to the implementor
"Observation Expressions, along with their Observation Operators and optional Qualifiers, MAY be surrounded with parenthesis to delineate which Observation Expressions the Qualifiers apply to."
I think this is enough to cover that:
All Observations that match a MUST have an observation time >= x and < y.
So if A&B or C&D are disjoint from X&Y, then it's obvious that there can be nothing that matches indicator, but it A&B and C&D overlap with X&Y, then it's valid. So I don't see a problem w/ your example.
P.S. I was never a fan of START/STOP, because it's a fixed point in time, and we already (now?) have valid_from/valid_until on the Indicator object.
"So if A&B or C&D are disjoint from X&Y, then it's obvious that there can be nothing that matches indicator, but it A&B and C&D overlap with X&Y, then it's valid. So I don't see a problem w/ your example."
How do you come to that conclusion though - we don't specify the rules of how START / STOP behave when applied to START / STOP... is it intersection, union? We don't say.
I quoted you the part of the spec which you ignored:
All Observations that match a MUST have an observation time >= x and < y.
So in your example:
( [ expr1 ] START A STOP B AND [ expr2 ] START C STOP D ) START X STOP Y
all of expr1 must be between A&B per above. all of expr2 must be between C&D per above. both expr1 and expr2 must be between X&Y per above statement from the spec.
Notice that the wording is all observations, not one, or some of the observations, but ALL the ones that match the parenthesized expression.
We talked about this on 2019-06-05 and the consensus on the call was to add this as a MUST NOT and Drew will provide text in the document.
I gave multiple examples that are well defined, and MUST be valid. As long as the text allows the valid examples to be given I am fine, but I will object to any changes that prevent the valid examples from being invalid patterns.
Text was added in section 9.5.1
Would this make things better? The more I read the spec the more that I think the issue here is with the spec and the grammar. The current grammar is recursive for observationExpression
observationExpression
: LBRACK comparisonExpression RBRACK # observationExpressionSimple
| LPAREN observationExpressions RPAREN # observationExpressionCompound
| observationExpression startStopQualifier # observationExpressionStartStop
| observationExpression withinQualifier # observationExpressionWithin
| observationExpression repeatedQualifier # observationExpressionRepeated
;
I think if we use the following, and replace observationExpression
with observationExpression
in higher levels, it could work. I haven't compiled this into code yet.
observationExpression
: LBRACK comparisonExpression RBRACK
| LPAREN observationExpressions RPAREN
;
qualifiedObservationExpression
: observationExpression repeatedQualifier? startStopQualifier withinQualifier?
| observationExpression repeatedQualifier? withinQualifier startStopQualifier?
| observationExpression repeatedQualifier?
;
This means changing the text in the Patterning spec to match the grammar.
We could still have weird stuff like:
[foo OR bar] WITHIN 10 SECONDS
where the WITHIN 10 SECONDS
is an automatic noop. But, I think this is better than what we have and leads to the functionality people want without allowing as many bizarre qualifiers that make no sense.
The text was added to 9.5.1, based on additional review of Drew's ANTLR grammar above from John-Mark and Jason, we may add additional text or clean it up further.
This is the text that was added.
An Observation Expression MUST NOT have more than one Qualifier of a particular type.
Would this make things better? The more I read the spec the more that I think the issue here is with the spec and the grammar. The current grammar is recursive for
observationExpression
observationExpression : LBRACK comparisonExpression RBRACK # observationExpressionSimple | LPAREN observationExpressions RPAREN # observationExpressionCompound | observationExpression startStopQualifier # observationExpressionStartStop | observationExpression withinQualifier # observationExpressionWithin | observationExpression repeatedQualifier # observationExpressionRepeated ;
I think if we use the following, and replace
observationExpression
withobservationExpression
in higher levels, it could work. I haven't compiled this into code yet.
I think you mean with qualifiedObservationExpression
observationExpression : LBRACK comparisonExpression RBRACK | LPAREN observationExpressions RPAREN ; qualifiedObservationExpression : observationExpression repeatedQualifier? startStopQualifier withinQualifier? | observationExpression repeatedQualifier? withinQualifier startStopQualifier? | observationExpression repeatedQualifier? ;
The above still does not allow for my example above w/ two WITHIN statements.
([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS
If this text remains, it will force me to vote NO for this going to CS.
As I have stated before, it is not the standard's job to prevent people from writing noop and other bad patterns. We should not proscribe the smallest/optimal pattern to get the job done. That is up to the pattern matcher engine to optimize it if it wants.
How does the following solution work for ([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS
(([ a ] AND [ b ]) WITHIN x SECONDS AND ([ a ] AND [ b ]) REPEATS 5 TIMES WITHIN y SECONDS)
How about only restricting duplicate WITHIN and STARTSTOP when not around repeats? As the standard says, REPEATS is short hand for ([a ] and [a]). So yes, you could manual expand repeats to get around your restriction, but that defeats the point of having REPEATS.
We really have two different types of qualifiers. We have the generic substitution qualifier, REPEATS that can easily be replaced in the AST w/ a bunch of ands. We then have temporal qualifiers that are WITHIN and STARTSTOP. So, I'd be fine w/ limiting repeating temporal qualifiers as long as it substitution qualifier don't count.
How does the following solution work for
([ a ] AND [ b ]) WITHIN x SECONDS REPEATS 5 TIMES WITHIN y SECONDS
(([ a ] AND [ b ]) WITHIN x SECONDS AND ([ a ] AND [ b ]) REPEATS 5 TIMES WITHIN y SECONDS)
Those are not equivalent. The reason they are not is that the second set of a and b are not restricted to be within x.
In my original pattern, imagine I'm looking for a and b that happen close together, like two DNS lookups. But then I'm looking at a larger pattern where I see those part of dns lookups happen multiple times over a larger period of time.
I agree that REPEATS
is a different beast than the other two. I mentioned it in Slack.
Yeah, they're definitely different beasts. I came to the same conclusion in Slack.
So, the link above is me opining on the solution to qualifiers in STIX patterns. In my mind,
REPEATED X TIMES
is not really a qualifier likeSTART/STOP
andWITHIN
. As such,REPEATED
is really a modifier on observation expressions. It can be thought of as syntactic sugar forAND
s in the language. If jmg, jkeirstead or jordan want to look at it, that’d be great.
The reason they are not is that the second set of a and b are not restricted to be within x.
But the first clause of the observationExpressionAnd
handles that:
(
([ a ] AND [ b ]) WITHIN x SECONDS
AND
([ a ] AND [ b ]) REPEATS 5 TIMES WITHIN y SECONDS
)
I feel like I am missing something here.
Ok, I think I see what you are saying:
"Is there an instance of a
and b
that occur with x
seconds of each other and repeat 5 times within y
seconds during those x
seconds?"
No, you need to expand out the repeats, I'll do it for 3 so it's not that long:
((([a] AND [b]) WITHIN x SECONDS) AND (([a] AND [b]) WITHIN x SECONDS) AND (([a] AND [b]) WITHIN x SECONDS)) WITHIN y SECONDS
That is applying the repeats rule in the standard, now to make things more clear, lets replace the a and b, w/ a, b, c, d, e and f, so they are unique.
Each pair of (a, b), (c, d), and (e, f), must be within x seconds of each other. And all of the a, b, c, d, e, f must be within y seconds of each other. This is simple when you expand out the repeats, and understand that each within clause is independent, and only applies to the observations that end up matching its respective expression.
So, as I said, I'm fine w/ disallowing duplicate within or start/stop qualifiers when apply to the same expression, the issue that is failed to see is that the REPEATS generates a NEW expression, and is not qualifying an existing expression.
Maybe we shouldn't call REPEATS a qualifier then? Be cause it doesn't qualify an expression, it really generates a new one.
I think of REPEATS
as a modifier.
Can someone point to a legitimate real world use case of
((([a] AND [b]) WITHIN x SECONDS) AND (([a] AND [b]) WITHIN x SECONDS) AND (([a] AND [b]) WITHIN x SECONDS)) WITHIN y SECONDS
I have been doing this for a very long time, and have never seen anyone write anything like this.
If no one is ever going to do it why are we arguing about it.
@JasonKeirstead by that logic, why don't we just get ride of REPEATS all together, because someone can just repeat out the clause that many times? There are lots of operators we could get rid of using that reductionist logic.
I see that as allowing my example as a simplified way of writing patterns that can be understood.
Because people use REPEATS-like syntax in SIEMs all the time, it is commonplace.
In my opinion we need to target the 70% use case. If we need to relax things later on, we can. We can even issue "errata" to relax something if it becomes urgent. I really worry about trying to provide a solution for every conceivable corner case. This is the problem that STIX 1 had. They tried to cover every possible solution. When we started STIX 2, we specifically said we were not going to to do that. Personally I would love it if in 6 months post STIX 2.1 CS we had 10 vendors yelling at us that we needed to add something or we forgot something. That would be awesome.
Agree with @jordan2175 .
The real problem we have with STIX pattern is - outside of extremely basic IOC matching, almost no one is using it. There are other cybersecurity matching languages that are being widely adopted, and SCO Pattern is not.
Whats more, we just made SCO Pattern optional in 2.1, so its not even mandatory for indicators anymore, and could easily lose traction even in STIX.
Our primary concern at this point should be consumability and implementability, to spread adoption beyond simple IOC matching.
I agree that usage is a problem. Yet you'd rather spend time arguing against my valid use case above, than trying to adopt it. Restricting what you can do with patterning is not a good way to drive adoption.
I thought we agreed that we would come to a compromised solution on the working call, and I'm attempting to compromise, and now that a compromise is close, you're saying you don't want to compromise? Is that the gist of it @JasonKeirstead ?
This does not impact implementability, as it's already been implemented in the MITRE 2.0 patterning library. So I call BS on this argument.
So, the REPEATS
expansions to AND
s you show above is actually not allowed in the STIX 2.0 spec.
In 2.0, REPEATS
is an Observation Expression Qualifier. It may modify an Observation Expression. Observation Expressions comprise one or more Comparison Expressions, joined via Boolean Operators. Observation Expressions do not include Observation Qualifiers. Patterns are the unit that groups Observations Expressions, Observation Operators, and Observation Qualifiers.
STIX 2.0 spec: 4.1.1 Observation Expression Qualifiers
I propose adding clarifying normative text:
This prevents goofy, ambiguous patterns like: