Closed GoogleCodeExporter closed 8 years ago
[Disclaimer: I'm not actually involved with the Yara project at all; I'm just a
user at present, who has been writing a lot of rules.]
I have a rather long wishlist of future features I would like for Yara to have.
(Some proposals I would like to present soon...) One of the big ones are macros
or function definitions. But....
One of the nice things about Yara is that it is *not* too powerful. It's syntax
expresses a regular language, which can be evaluated in polynomial time.
Introducing loops or recursion would make it possible to write rules which take
an almost infinite amount of time to evaluate. Several of the big-name
organizations using Yara (VirusTotal, FireEye, etc.) allow users to upload
their own Yara rules, to be evaluated on these detection platforms. Being able
to upload a Yara rule which pegs the CPU at 100% and uses Gigs and Gigs of
memory would be bad.
At the same time, having a non-recursive substitution syntax for use in the
conditions would greatly aid in maintainability. The C pre-processor is
probably a good example of this; it can be evaluated in polynomial time. (I
think NASM's macro syntax is another good example, but it's been a while since
I thought about it.)
So, the ideas I had were either:
1. Allow (earlier) rule definitions to take an argument. Similar to the global
external value you can pass from the command-line... only local. The rule would
need to be (re-)evaluated at time of use in the later rule, of course.
2. Each rule may have an additional section for constants/macro statements.
Something like:
rule example {
macros:
a($$) = "uint32($$)|uint32(0)"
b = "$c at 1234"
number_of_sections = "uint16(uint32(0x3c)+6)"
strings:
$c = "something"
condition:
(%a(42) or %a(56234) or %b)
and
(number_of_sections == 6)
}
... or whatever syntax would be good, use your imagination.
3. A top-level 'function' keyword, like 'rule' (as proposed). The thing that
*must* be avoided is constructs like these:
function dontdothis {
return:
dontdothis()
}
...or...
function foo { return: bar() }
function bar { return: foo() }
4. Or something else even more simple yet clever... but not becoming common
lisp. (I'm thinking of lazy-evaluation, functions, without side-effects; but I
need to do some more research on this stuff.)
Original comment by juliavi...@gmail.com
on 1 Jun 2012 at 12:45
Oh yeah, I was going to say...
5. An optimization I've been thinking about, can also be used like a function.
Currently, as far as I know, all rules are always evaluated. It would be nice,
to skip over some rules, if other rules evaluate to false (don't alert). So you
could write a set of fast-path rules, and slow-path rules, where the fast rules
are quick and easy, and must always be true for some other rule to be true, but
alone, by themselves, will alert way way too much. (False positives.)
EXE scanning is a good example. You can write a rule which tests for a "MZ" and
"PE\0\0" at some offset, very quickly. But if you then had several complicated
regular expressions, those would execute slowly, and you wouldn't need to
evaluate them at all, if there wasn't already a "MZ" and "PE\0\0".
As I currently understand, Yara will always evaluate every statement. And What
I'd like is to be able to short-circut some of the tests. I don't think just
putting "rulename and $re" in the condition section does this.
A side-effect of this, is that you can use earlier rules like functions which
return true or false. Well, actually you can do that now, but I'd rather not
have Yara always evaluate everything all the time.
private rule example {
strings:
$a = "MZ"
condition:
$a at 0
}
rule slowness {
strings:
$a = /aa*b*a*/
condition:
example and $a
}
// I don't want this rule to run, if "example" doesn't alert
Original comment by juliavi...@gmail.com
on 1 Jun 2012 at 1:07
It's great to see in-depth discussions about Yara :-)
I've been thinking for a while about implementing a preprocessor in Yara, just
like the C preprocesor. That would give a lot of flexibility keeping the syntax
clean at the same time. I don't really like the idea of including functions or
any complex construct in Yara, I prefer to keep it simple.
Regarding short-circuit evaluation, Yara does short-circuit evaluation of
expressions. If the condition is $a and (...whatever...) the "whatever"
expression is not evaluated if $a is false, BUT.... that doesn't mean that if
the expression is $a and $b, and $b is an slow regular expression, the scanning
process will be faster if $a is not present. Keep in mind that Yara scan
through the file searching for ALL the strings, and then, after knowing which
strings are in the file an which not, it evaluates all rule conditions. It
would be pretty slow to search for every string separately as part of the
condition evaluation.
Original comment by plus...@gmail.com
on 2 Jun 2012 at 5:27
This will be solved with macros. See issue 61.
Original comment by plus...@gmail.com
on 15 Aug 2012 at 4:10
Original issue reported on code.google.com by
golgotr...@gmail.com
on 30 May 2012 at 8:04