Open ruppde opened 10 months ago
here's the regex in the raw rule yaml:
- string: /^(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/
and as logged above, from get_value_str
:
[regex(string =~ /^(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/)]
and they look the same to me. So get_value_str
returns the regex as provided in the capa rule. This can (almost directly) be passed to re.compile
to create a regular expression instance. I assume it can be converted to a yara-compatible regex trivially, too.
Can you explain the problem in a little more detail?
The line after that is the problem. That's what ends up in the yara rule and it's escaped when it shouldn't: ... INFO:capa2yara:doing regex: '/^(\\\\\\\\\\?\\\\)?([\\w]\\:|\\\\)(\\\\((?![\\<\\>\\\"\\/\\|\\*\\?\\:\\\\])[\\x20-\\x5B\\x5D-\\x7E])+)+\\:\\$?[a-zA-Z0-9_]+/' ...
is this possibly because the logging statement uses %r
instead of %s
? i don't think there's any extra escaping being done by capa.
Should be this line from the commit above:
I took a look at this a few days ago, but I couldn't figure out what was going on. Here's my output after reverting 58e94a35cbaa384307410ef846b5965868b051e2:
❯ python scripts/capa2yara.py -q rules/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml > test.yar
...
import "pe"
private rule capa_pe_file : CAPA {
meta:
description = "Match in PE files. Used by other CAPA rules"
condition:
uint16be(0) == 0x4d5a
or uint16be(0) == 0x558b
or uint16be(0) == 0x5649
}
rule capa_reference_absolute_stream_path_on_Windows : CAPA {
meta:
description = "reference absolute stream path on Windows (converted from capa rule)"
namespace = "host-interaction/file-system"
static scope = "basic block"
dynamic scope = "call"
hash = "51828683DC26BFABD3994494099AE97D"
reference = "This YARA rule converted from capa rule: https://github.com/mandiant/capa-rules/blob/master/rules/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml"
capa_nursery = "False"
date = "2024-04-05"
minimum_yara = "3.8"
license = "Apache-2.0 License"
strings:
$re_aaa = /\x00(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/ ascii wide
condition:
capa_pe_file and
(
$re_aaa
)
}
It seems like YARA doesn't like spaces in meta field names, so I had to add underscores to static scope
and dynamic scope
. Even after that, it still doesn't work with the converted rule. I'm also not sure where the null byte came from. I tried to play around with the original capa regex, but couldn't get that to work either.
❯ yara test.yar 51828683dc26bfabd3994494099ae97d.elf_
error: rule "capa_reference_absolute_stream_path_on_Windows" in test.yar(43): invalid regular expression "$re_aaa": syntax error
What did the regex look like when the script was working?
Description
With https://github.com/mandiant/capa/commit/58e94a35cbaa384307410ef846b5965868b051e2 the regexes returned by
get_value_str()
are escaped which breaks e.g. https://github.com/mandiant/capa/blob/3f449f3c0f1e2544ca7bad83c90e2d162ec0b916/scripts/capa2yara.py#L262Steps to Reproduce
Run
The 2nd line shows the regex escaped, which is of no use in yara:
Expected behavior:
No escaping
Actual behavior:
See above
Versions
Most recent github version
Additional Information
How should we fix this? Introduce another function which returns the regex unescaped?
(capa2yara.py is the only script in scripts/ which uses the function, so shouldn't have broken more)