mandiant / capa

The FLARE team's open-source tool to identify capabilities in executable files.
https://mandiant.github.io/capa/
Apache License 2.0
4.8k stars 554 forks source link

Change to get_value_str() to escape regexes broke capa2yara.py #1909

Open ruppde opened 10 months ago

ruppde commented 10 months ago

Description

With https://github.com/mandiant/capa/commit/58e94a35cbaa384307410ef846b5965868b051e2 the regexes returned by get_value_str() are escaped which breaks e.g. https://github.com/mandiant/capa/blob/3f449f3c0f1e2544ca7bad83c90e2d162ec0b916/scripts/capa2yara.py#L262

Steps to Reproduce

Run

python ./scripts/capa2yara.py  rules/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml 2>&1 |grep x5D

The 2nd line shows the regex escaped, which is of no use in yara:

...
INFO:capa2yara:doing kids: [regex(string =~ /^(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/)] - len: 1
INFO:capa2yara:doing regex: '/^(\\\\\\\\\\\\\\\\\\\\?\\\\\\\\)?([\\\\w]\\\\:|\\\\\\\\)(\\\\\\\\((?![\\\\<\\\\>\\\\\\"\\\\/\\\\|\\\\*\\\\?\\\\:\\\\\\\\])[\\\\x20-\\\\x5B\\\\x5D-\\\\x7E])+)+\\\\:\\\\$?[a-zA-Z0-9_]+/'
...

Expected behavior:

No escaping

Actual behavior:

See above

Versions

Most recent github version

Additional Information

How should we fix this? Introduce another function which returns the regex unescaped?

(capa2yara.py is the only script in scripts/ which uses the function, so shouldn't have broken more)

williballenthin commented 10 months ago

here's the regex in the raw rule yaml:

- string: /^(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/

https://github.com/mandiant/capa-rules/blob/57b3911a72462e0597ca0d6685f8b02b38857765/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml#L17C1-L17C112

and as logged above, from get_value_str:

[regex(string =~ /^(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/)]

and they look the same to me. So get_value_str returns the regex as provided in the capa rule. This can (almost directly) be passed to re.compile to create a regular expression instance. I assume it can be converted to a yara-compatible regex trivially, too.

Can you explain the problem in a little more detail?

ruppde commented 10 months ago

The line after that is the problem. That's what ends up in the yara rule and it's escaped when it shouldn't: ... INFO:capa2yara:doing regex: '/^(\\\\\\\\\\?\\\\)?([\\w]\\:|\\\\)(\\\\((?![\\<\\>\\\"\\/\\|\\*\\?\\:\\\\])[\\x20-\\x5B\\x5D-\\x7E])+)+\\:\\$?[a-zA-Z0-9_]+/' ...

williballenthin commented 10 months ago

is this possibly because the logging statement uses %r instead of %s? i don't think there's any extra escaping being done by capa.

ruppde commented 10 months ago

Should be this line from the commit above: image

acelynnzhang commented 6 months ago

I took a look at this a few days ago, but I couldn't figure out what was going on. Here's my output after reverting 58e94a35cbaa384307410ef846b5965868b051e2:

❯ python scripts/capa2yara.py -q rules/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml > test.yar
...

import "pe"

private rule capa_pe_file : CAPA {
    meta:
        description = "Match in PE files. Used by other CAPA rules"
    condition:
        uint16be(0) == 0x4d5a
        or uint16be(0) == 0x558b
        or uint16be(0) == 0x5649
}

rule capa_reference_absolute_stream_path_on_Windows : CAPA  {
  meta:
    description = "reference absolute stream path on Windows (converted from capa rule)"
    namespace = "host-interaction/file-system"
    static scope = "basic block"
    dynamic scope = "call"
    hash = "51828683DC26BFABD3994494099AE97D"
    reference = "This YARA rule converted from capa rule: https://github.com/mandiant/capa-rules/blob/master/rules/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml"
    capa_nursery = "False"
    date = "2024-04-05"
    minimum_yara = "3.8"
    license = "Apache-2.0 License"

  strings:
    $re_aaa = /\x00(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/ ascii wide

  condition:
    capa_pe_file and
 (
            $re_aaa
    )
}

It seems like YARA doesn't like spaces in meta field names, so I had to add underscores to static scope and dynamic scope. Even after that, it still doesn't work with the converted rule. I'm also not sure where the null byte came from. I tried to play around with the original capa regex, but couldn't get that to work either.

❯ yara test.yar 51828683dc26bfabd3994494099ae97d.elf_
error: rule "capa_reference_absolute_stream_path_on_Windows" in test.yar(43): invalid regular expression "$re_aaa": syntax error

What did the regex look like when the script was working?