odin1314 / yara-project

Automatically exported from code.google.com/p/yara-project
Apache License 2.0
0 stars 0 forks source link

Feature request #53

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I think that can be very useful if you can define functions in yara.
For example, something like that:

function number_of_sections
{
    return:
         uint16(uint32(0x3c)+6)
}

Will make the rules much simpler and make yara a even more powerful language. :)

Original issue reported on code.google.com by golgotr...@gmail.com on 30 May 2012 at 8:04

GoogleCodeExporter commented 9 years ago
[Disclaimer: I'm not actually involved with the Yara project at all; I'm just a 
user at present, who has been writing a lot of rules.] 

I have a rather long wishlist of future features I would like for Yara to have. 
(Some proposals I would like to present soon...) One of the big ones are macros 
or function definitions. But....

One of the nice things about Yara is that it is *not* too powerful. It's syntax 
expresses a regular language, which can be evaluated in polynomial time. 
Introducing loops or recursion would make it possible to write rules which take 
an almost infinite amount of time to evaluate. Several of the big-name 
organizations using Yara (VirusTotal, FireEye, etc.) allow users to upload 
their own Yara rules, to be evaluated on these detection platforms. Being able 
to upload a Yara rule which pegs the CPU at 100% and uses Gigs and Gigs of 
memory would be bad.

At the same time, having a non-recursive substitution syntax for use in the 
conditions would greatly aid in maintainability. The C pre-processor is 
probably a good example of this; it can be evaluated in polynomial time. (I 
think NASM's macro syntax is another good example, but it's been a while since 
I thought about it.)

So, the ideas I had were either:

1. Allow (earlier) rule definitions to take an argument. Similar to the global 
external value you can pass from the command-line... only local. The rule would 
need to be (re-)evaluated at time of use in the later rule, of course.

2. Each rule may have an additional section for constants/macro statements. 
Something like:

rule example {
  macros:
    a($$) = "uint32($$)|uint32(0)"
    b = "$c at 1234"
    number_of_sections = "uint16(uint32(0x3c)+6)"
  strings:
    $c = "something"
  condition:
    (%a(42) or %a(56234) or %b)
    and 
    (number_of_sections == 6)
}

... or whatever syntax would be good, use your imagination.

3. A top-level 'function' keyword, like 'rule' (as proposed). The thing that 
*must* be avoided is constructs like these:

function dontdothis {
   return:
      dontdothis()
}

...or...

function foo { return: bar() }
function bar { return: foo() }

4. Or something else even more simple yet clever... but not becoming common 
lisp. (I'm thinking of lazy-evaluation, functions, without side-effects; but I 
need to do some more research on this stuff.)

Original comment by juliavi...@gmail.com on 1 Jun 2012 at 12:45

GoogleCodeExporter commented 9 years ago
Oh yeah, I was going to say...

5. An optimization I've been thinking about, can also be used like a function. 
Currently, as far as I know, all rules are always evaluated. It would be nice, 
to skip over some rules, if other rules evaluate to false (don't alert). So you 
could write a set of fast-path rules, and slow-path rules, where the fast rules 
are quick and easy, and must always be true for some other rule to be true, but 
alone, by themselves, will alert way way too much. (False positives.) 

EXE scanning is a good example. You can write a rule which tests for a "MZ" and 
"PE\0\0" at some offset, very quickly. But if you then had several complicated 
regular expressions, those would execute slowly, and you wouldn't need to 
evaluate them at all, if there wasn't already a "MZ" and "PE\0\0". 

As I currently understand, Yara will always evaluate every statement. And What 
I'd like is to be able to short-circut some of the tests. I don't think just 
putting "rulename and $re" in the condition section does this.

A side-effect of this, is that you can use earlier rules like functions which 
return true or false. Well, actually you can do that now, but I'd rather not 
have Yara always evaluate everything all the time.

private rule example {
   strings:
      $a = "MZ"
   condition:
      $a at 0
}

rule slowness {
  strings:
      $a = /aa*b*a*/
  condition:
    example and $a
} 
// I don't want this rule to run, if "example" doesn't alert

Original comment by juliavi...@gmail.com on 1 Jun 2012 at 1:07

GoogleCodeExporter commented 9 years ago
It's great to see in-depth discussions about Yara :-)

I've been thinking for a while about implementing a preprocessor in Yara, just 
like the C preprocesor. That would give a lot of flexibility keeping the syntax 
clean at the same time. I don't really like the idea of including functions or 
any complex construct in Yara, I prefer to keep it simple.

Regarding short-circuit evaluation, Yara does short-circuit evaluation of 
expressions. If the condition is $a and (...whatever...)  the "whatever" 
expression is not evaluated if $a is false, BUT.... that doesn't mean that if 
the expression is $a and $b, and $b is an slow regular expression, the scanning 
process will be faster if $a is not present. Keep in mind that Yara scan 
through the file searching for ALL the strings, and then, after knowing which 
strings are in the file an which not, it evaluates all rule conditions. It 
would be pretty slow to search for every string separately as part of the 
condition evaluation.

Original comment by plus...@gmail.com on 2 Jun 2012 at 5:27

GoogleCodeExporter commented 9 years ago
This will be solved with macros. See issue 61.

Original comment by plus...@gmail.com on 15 Aug 2012 at 4:10