rapidsai / clx

A collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
Apache License 2.0
168 stars 68 forks source link

[BUG] Invalid Regex Pattern #501

Closed bsuryadevara closed 2 years ago

bsuryadevara commented 2 years ago

Describe the bug Missing close parenthesis causing test errors.

RuntimeError: cuDF failure at: /workspace/.conda-bld/work/cpp/src/strings/regex/regcomp.cpp:856: unmatched left parenthesis
Stacktrace
def test_windows_event_parser():
        wep = WindowsEventParser()
        test_input_df = cudf.DataFrame()
        raw_colname = "_raw"
        test_input_df[raw_colname] = TEST_DATA
>       test_output_df = wep.parse(test_input_df, raw_colname)

clx/tests/test_windows_event_parser.py:693: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/conda/envs/rapids/lib/python3.9/site-packages/clx/parsers/windows_event_parser.py:57: in parse
    temp = self.parse_raw_event(
/opt/conda/envs/rapids/lib/python3.9/site-packages/clx/parsers/event_parser.py:83: in parse_raw_event
    extracted_gdf = dataframe[raw_column].str.extract(regex_pattern)
/opt/conda/envs/rapids/lib/python3.9/site-packages/cudf/core/column/string.py:633: in extract
    data, _ = libstrings.extract(self._column, pat, flags)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   RuntimeError: cuDF failure at: /workspace/.conda-bld/work/cpp/src/strings/regex/regcomp.cpp:856: unmatched left parenthesis

extract.pyx:32: RuntimeError