oasis-open / cti-python-stix2

OASIS TC Open Repository: Python APIs for STIX 2
https://stix2.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
356 stars 113 forks source link

The pattern argument when creating an Indicator object tries to incorrectly interpret a literal string. #536

Closed nirmalneupane closed 2 years ago

nirmalneupane commented 2 years ago

Example: Indicator(pattern_type="stix", pattern="[url:value = 'http://example.com\'\'\'\'']")

Traceback (most recent call last): File "", line 1, in File "/Users/nirmal.neupane/ti-export/ti-export/lib/python3.9/site-packages/stix2/v21/sdo.py", line 250, in init super(Indicator, self).init(*args, **kwargs) File "/Users/nirmal.neupane/ti-export/ti-export/lib/python3.9/site-packages/stix2/base.py", line 232, in init self._check_object_constraints() File "/Users/nirmal.neupane/ti-export/ti-export/lib/python3.9/site-packages/stix2/v21/sdo.py", line 270, in _check_object_constraints raise InvalidValueError(self.class, 'pattern', str(errors[0])) stix2.exceptions.InvalidValueError: Invalid value for Indicator 'pattern': FAIL: Error found at line 1:33. mismatched input '''' expecting ']'

However, following is a workaround that works, above should have the same effect, imo: Indicator(pattern_type="stix", pattern="["+str(stix2.EqualityComparisonExpression(stix2.ObjectPath('url',['value']),'http://example.com\'\'\'\''))+"]")

Indicator(type='indicator', spec_version='2.1', id='indicator--e5bbda48-26fa-4225-b648-ff7398a33b8d', created='2022-01-13T22:38:42.933043Z', modified='2022-01-13T22:38:42.933043Z', pattern="[url:value = 'http://example.com\'\'\'\'']", pattern_type='stix', pattern_version='2.1', valid_from='2022-01-13T22:38:42.933043Z', revoked=False)

clenk commented 2 years ago

Hi @nirmalneupane, \' escapes the quote for python but not for the pattern. So your first example creates the pattern [url:value = 'http://example.com'''''].

You'll need to escape the backslashes as well for python so they appear in the pattern.

>>> x = stix2.Indicator(pattern_type="stix", pattern="[url:value = 'http://example.com\\'\\'\\'\\'']")
>>> print(x.pattern)
[url:value = 'http://example.com\'\'\'\'']
nirmalneupane commented 2 years ago

This seems inconsistent and probably creates downstream issues that are using this library downstream. Why does (stix2.ObjectPath('url',['value']),'http://example.com\'\'\'\'') not require double escape characters then?

Most of the times, we are programmatically using the library and not adding double escape characters manually. Because of this limitation, even the function string.encode('unicode_escape') doesn't work. Can you suggest a programmatic way to sanitize escape quotes and other characters that are likely to be included in URL indicators geared towards injection attacks.

chisholm commented 2 years ago

Try pasting your code snip into a Python prompt:

>>> "["+str(stix2.EqualityComparisonExpression(stix2.ObjectPath('url',['value']),'http://example.com\'\'\'\''))+"]"
"[url:value = 'http://example.com\\'\\'\\'\\'']"

Note that you get the same string as Chris showed. The reason your second code snip works is because it's creating a string containing a STIX pattern with the correct syntax. Your first code snip doesn't, so it fails. When you create a pattern AST, you are providing the individual pieces of the pattern and leaving it up to the library code to ensure proper escaping:

https://github.com/oasis-open/cti-python-stix2/blob/17445a085cb84734900603eb8009bcc856892762/stix2/patterns.py#L11-L12

The AST code is written to produce a correct STIX pattern. When you provide a pattern string yourself, it's your responsibility to format it correctly. Either way, the library attempts to parse the pattern string to ensure the pattern is valid. If parsing fails, you get that error.

All you need to do is escape single quotes and backslashes. The reason your first pattern string is wrong is because you didn't escape single quotes:

>>> print("[url:value = 'http://example.com\'\'\'\'']")
[url:value = 'http://example.com''''']

The embedded single quotes require escaping, as follows:

>>> print("[url:value = 'http://example.com\\'\\'\\'\\'']")
[url:value = 'http://example.com\'\'\'\'']

You could use the AST classes if you wanted to, or just insert the necessary escape characters. The code quoted above does this, for string constants (you couldn't use that on the whole pattern).

chisholm commented 2 years ago

Here's a variant of your first snip that works by the way:

Indicator(pattern_type="stix", pattern=r"[url:value = 'http://example.com\'\'\'\'']")

Note the use of a raw string, in which backslashes are not interpreted as escape characters.

nirmalneupane commented 2 years ago

https://github.com/oasis-open/cti-python-stix2/blob/17445a085cb84734900603eb8009bcc856892762/stix2/patterns.py#L11-L12

The AST code is written to produce a correct STIX pattern. When you provide a pattern string yourself, it's your responsibility to format it correctly. Either way, the library attempts to parse the pattern string to ensure the pattern is valid. If parsing fails, you get that error.

I think I get it now. From the implementation, it looks like if I create a String Constant object from patterns module and use that to pass into the Indicator constructor, the library quotes will be escaped by the library and get desired result. Example

url_indicator = r"http://google.com''''''''" url_indicator = stix2.StringConstant(url_indicator) str(url_indicator)

"'http://google.com\'\'\'\'\'\'\'\''"

Anyways, maybe few line in documentation for this behavior might benefit people who are just getting introduced to the library.