opencybersecurityalliance / stix-shifter

This project consists of an open source library allowing software to connect to data repositories using STIX Patterning, and return results as STIX Observations.
https://stix-shifter.readthedocs.io
Other
227 stars 234 forks source link

multiple invalid escape sequences in regular expressions #1269

Closed pcoccoli closed 1 year ago

pcoccoli commented 1 year ago

Describe the bug The are multiples regular expressions in regular (not raw) strings that single the RE backslash escape, but it's interpreted as a Python string escape.

To Reproduce Steps to reproduce the behavior:

  1. pytest
  2. See warnings summary (copied below)

Expected behavior Should use raw strings for regular expressions

Screenshots

  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:2: DeprecationWarning: invalid escape sequence \d
    'date': '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(.\d+)?Z',

stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:3
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:3: DeprecationWarning: invalid escape sequence \.
    'ipv4': ('^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'),  # noqa: E501

stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:4
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:4: DeprecationWarning: invalid escape sequence \.
    'ipv6': ('^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$'),

stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:5
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:5: DeprecationWarning: invalid escape sequence \.
    'mac': ('^(([0-9a-fA-F]{2}[:-]){5}([0-9a-fA-F]{2})|([0-9a-fA-F]{3}[\.]){3}([0-9a-fA-F]{3}))$'),

stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:6
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:6: DeprecationWarning: invalid escape sequence \.
    'ipv4_cidr': ('^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\/(([1-2][0-9])|(3[0-2])|[0-9])$'),  # noqa: E501

stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:7
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:7: DeprecationWarning: invalid escape sequence \.
    'domain_name': ('^(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?(\.)?)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$'),

stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:8
stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:8
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/json_to_stix/observable.py:8: DeprecationWarning: invalid escape sequence \.
    'ipv6_cidr': ('^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:' '[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|' '([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|' '[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%' '[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}' '[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))\/((1[0-2][0-8])|([1-9][0-9])|[0-9])$')
stix_shifter_utils/stix_translation/src/patterns/pattern_objects.py:184
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/patterns/pattern_objects.py:184: DeprecationWarning: invalid escape sequence \d
    pattern = "^t'\d{4}(-\d{2}){2}T\d{2}(:\d{2}){2}(\.\d+)?Z'$"

stix_shifter_utils/stix_translation/src/utils/stix_pattern_parser.py:86
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/utils/stix_pattern_parser.py:86: DeprecationWarning: invalid escape sequence \.
    pattern = "\.\d+Z$"

stix_shifter_utils/stix_translation/src/utils/transformers.py:100
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_utils/stix_translation/src/utils/transformers.py:100: DeprecationWarning: invalid escape sequence \.
    if type(obj) is str and re.search('\.', obj):

stix_shifter_modules/aws_athena/tests/stix_translation/test_aws_athena_stix_to_query.py:117
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/aws_athena/tests/stix_translation/test_aws_athena_stix_to_query.py:117: DeprecationWarning: invalid escape sequence \d
    "ocsf": "(REGEXP_LIKE(CAST(src_endpoint.ip as varchar), '\d+') AND time BETWEEN 1601541790000 AND 1604054590000)"

stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:12
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:12: DeprecationWarning: invalid escape sequence \s
    last_time_criteria = "\s?LAST\s?(\d*)\s?(MINUTES|HOURS|DAYS)"

stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:14
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:14: DeprecationWarning: invalid escape sequence \d
    "'(\d{4}(-\d{2}){2}\s?(\d{2}:\d{2}))'": "%Y-%m-%d %H:%M",

stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:15
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:15: DeprecationWarning: invalid escape sequence \d
    "'(\d{4}(-\d{2}){2}\s?\d{2}(:\d{2}){2})'": "%Y-%m-%d %H:%M:%S",

stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:16
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:16: DeprecationWarning: invalid escape sequence \d
    "'(\d{4}(/\d{2}){2}\s?\d{2}(:\d{2}){2})'": "%Y/%m/%d %H:%M:%S",

stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:17
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:17: DeprecationWarning: invalid escape sequence \d
    "'(\d{4}(/\d{2}){2}\s?\d{2}(:\d{2}){2})'": "%Y/%m/%d-%H:%M:%S",

stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:18
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/aql_query_translator.py:18: DeprecationWarning: invalid escape sequence \d
    "'(\d{4}(:\d{2}){2}-\d{2}(:\d{2}){2})'": "%Y:%m:%d-%H:%M:%S",

stix_shifter_modules/qradar/stix_translation/query_constructor.py:21
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/query_constructor.py:21: DeprecationWarning: invalid escape sequence \d
    START_STOP_STIX_QUALIFIER = "START((t'\d{4}(-\d{2}){2}T\d{2}(:\d{2}){2}(\.\d+)?Z')|(\s\d{13}\s))STOP"

stix_shifter_modules/qradar/stix_translation/query_constructor.py:22
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/query_constructor.py:22: DeprecationWarning: invalid escape sequence \d
    TIMESTAMP = "^'\d{4}(-\d{2}){2}T\d{2}(:\d{2}){2}(\.\d+)?Z'$"

stix_shifter_modules/qradar/stix_translation/query_constructor.py:23
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/stix_translation/query_constructor.py:23: DeprecationWarning: invalid escape sequence \.
    TIMESTAMP_MILLISECONDS = "\.\d+Z$"

stix_shifter_modules/qradar/tests/stix_translation/qradar_stix_to_aql/test_qradar_events_stix_to_query.py:337
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar/tests/stix_translation/qradar_stix_to_aql/test_qradar_events_stix_to_query.py:337: DeprecationWarning: invalid escape sequence \.
    search_string = '^.*http://graphics8\\\.nytimes\\\.com/bcvideo.*$'

stix_shifter_modules/qradar_perf_test/stix_translation/query_constructor.py:21
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar_perf_test/stix_translation/query_constructor.py:21: DeprecationWarning: invalid escape sequence \d
    START_STOP_STIX_QUALIFIER = "START((t'\d{4}(-\d{2}){2}T\d{2}(:\d{2}){2}(\.\d+)?Z')|(\s\d{13}\s))STOP"

stix_shifter_modules/qradar_perf_test/stix_translation/query_constructor.py:22
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar_perf_test/stix_translation/query_constructor.py:22: DeprecationWarning: invalid escape sequence \d
    TIMESTAMP = "^'\d{4}(-\d{2}){2}T\d{2}(:\d{2}){2}(\.\d+)?Z'$"

stix_shifter_modules/qradar_perf_test/stix_translation/query_constructor.py:23
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar_perf_test/stix_translation/query_constructor.py:23: DeprecationWarning: invalid escape sequence \.
    TIMESTAMP_MILLISECONDS = "\.\d+Z$"

stix_shifter_modules/qradar_perf_test/tests/stix_translation/qradar_perf_test_stix_to_aql/test_qradar_perf_test_events_stix_to_query.py:[336](https://github.com/pcoccoli/stix-shifter/actions/runs/3847045332/jobs/6553037097#step:6:337)
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/qradar_perf_test/tests/stix_translation/qradar_perf_test_stix_to_aql/test_qradar_perf_test_events_stix_to_query.py:336: DeprecationWarning: invalid escape sequence \.
    search_string = '^.*http://graphics8\\\.nytimes\\\.com/bcvideo.*$'

stix_shifter_modules/reaqta/stix_translation/query_constructor.py:12
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/reaqta/stix_translation/query_constructor.py:12: DeprecationWarning: invalid escape sequence \.
    TIMESTAMP_MILLISECONDS = "\.\d+Z$"

stix_shifter_modules/secretserver/stix_transmission/api_client.py:132
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/secretserver/stix_transmission/api_client.py:132: DeprecationWarning: invalid escape sequence \d
    pattern = re.compile("\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z")

stix_shifter_modules/security_advisor/stix_translation/results_translator.py:83
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/security_advisor/stix_translation/results_translator.py:83: DeprecationWarning: invalid escape sequence \-
    regex = "[/~!@#$%^&*()\-_+={}\[\]|\\:;\"`\'<>.\?\w]+"

stix_shifter_modules/security_advisor/stix_translation/results_translator.py:94
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/security_advisor/stix_translation/results_translator.py:94: DeprecationWarning: invalid escape sequence \w
    path = re.search("[/[\w]*/+", value).group()

tests/stix_translation/test_dialects.py::TestTranslationDialecs::test_supported_dialects
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/error_test/stix_translation/query_translator.py:5: DeprecationWarning: invalid escape sequence \s
    START_STOP_PATTERN = "\s?START\s?t'\d{4}(-\d{2}){2}T\d{2}(:\d{2}){2}(\.\d+)?Z'\sSTOP\s?t'\d{4}(-\d{2}){2}T(\d{2}:){2}\d{2}.\d{1,3}Z'\s?"

tests/stix_translation/test_dialects.py::TestTranslationDialecs::test_supported_dialects
  /home/runner/work/stix-shifter/stix-shifter/stix_shifter_modules/stix_bundle/stix_translation/query_translator.py:4: DeprecationWarning: invalid escape sequence \s
    START_STOP_PATTERN = "\s?START\s?t'\d{4}(-\d{2}){2}T\d{2}(:\d{2}){2}(\.\d+)?Z'\sSTOP\s?t'\d{4}(-\d{2}){2}T(\d{2}:){2}\d{2}.\d{1,3}Z'\s?"
subbyte commented 1 year ago

Looks like the escaping still have an issue.

target data in elastic

The data in the store is C:\Windows\system32\svchost.exe ...

stix-shifter command line utility

Query to our elasticsearch instance using the stix-shifter command line utility (bash escaped):

The printed return (segment of it) from the first case using the stix-shifter command line utility is:

                    "working_directory": "C:\\Windows\\system32\\"
                    "command_line": "C:\\Windows\\system32\\services.exe"
                    "command_line": "C:\\Windows\\system32\\svchost.exe -k netsvcs -p -s wuauserv",
                        "C:\\Windows\\system32\\svchost.exe",
                        "C:\\Windows\\system32\\services.exe"
                    "working_directory": "C:\\Windows\\system32\\"
                    "command_line": "C:\\Windows\\system32\\services.exe"
                    "command_line": "C:\\Windows\\system32\\svchost.exe -k netsvcs -p -s wlidsvc",

stix-shifter called by Kestrel

Only 4 backslashes work correctly, not 2 or 8.

svchost2 = GET process FROM stixshifter://host101
           WHERE command_line MATCHES '.*system32\\\\svchost.exe.*'
           AND x-oca-event:action = 'Process Create (rule: ProcessCreate)'
           START 2021-10-20T00:00:00.000Z STOP 2021-10-21T00:00:00.000Z

what is suppose to work

subbyte commented 1 year ago

Issue confirmed.

An upper layer Python code needs to put 4 backslash chars to match 1 backslash char in data, which should not be expected. The expected way should be: to match 1 backslash char in data, the upper layer Python code needs to have 2 backslahes chars in the raw string.

stix-shifter behaves consistently with stix2matcher (used in Kestrel stix-bundle interface), both of which need 4 backslashs in raw string to match 1 backslash in data. Both needs correction.

  1. Unit test case for Kestrel parser (STIX pattern generator): https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/tests/test_parser.py#L123
  2. Full test case against Kestrel stix-bundle interface: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/tests/test_command_get.py#L324
  3. Full test case against stix-shifter elastic_ecs connector shows the same result.