odin1314 / yara-project

Automatically exported from code.google.com/p/yara-project
Apache License 2.0
0 stars 0 forks source link

global filesize condition disregarded #69

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
# docs say 
#5.1 Global rules
#  Global rules give you the possibility of imposing restrictions in all your 
rules at once. For
#  example, suppose that you want all your rules ignoring those files that 
exceed certain size

Based on test case it looks like files exceding global rule file size condition 
are scanned which slows 
scanning considerably 

$ mkdir testcase
$ dd if=/dev/zero of=./testcase/100mb bs=1k count=100000
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 0.783997 s, 131 MB/s
$ cat simple_test.yar 
    global rule SizeLimit
    {
        condition:
            filesize < 1MB
    }
    rule exploit
    {
        strings:
            $exploit = "Exploit" nocase
        condition:
            1 of them
    }
$ mkdir testcase/subdir
$ echo 'exploit' > testcase/subdir/some_bad_file
$ time ./yara -r -f simple_test.yar testcase/
SizeLimit testcase//subdir/some_bad_file
exploit testcase//subdir/some_bad_file

real    0m3.575s
user    0m3.476s
sys 0m0.076s
$ rm testcase/100mb
$ time ./yara -r -f simple_test.yar testcase/
SizeLimit testcase//subdir/some_bad_file
exploit testcase//subdir/some_bad_file

real    0m0.004s
user    0m0.004s
sys 0m0.000s

as seen in this example it seems there were 3,5 seconds wasted to scan 100Mb 
file which should have been skipped since global rule says only files below 1mb 
should be scanned.

Original issue reported on code.google.com by hrvoje.s...@gmail.com on 21 Jan 2013 at 5:59

GoogleCodeExporter commented 9 years ago
Taking a quick look at the source code, it's as I suspected... yr_scan_file() 
basically mmap()'s the entire file first, and then begins to evaluate the 
rules...

Specifically, first it does this:

pmapped_file->size = fstat.st_size;

pmapped_file->data = (unsigned char*) mmap(0, pmapped_file->size, PROT_READ, 
MAP_PRIVATE, pmapped_file->file, 0);

And then, if that worked OK, it does this:

result = yr_scan_mem(mfile.data, mfile.size, context, callback, user_data);     

yr_scan_mem_blocks() ...
   [ stuff... ]
                    /* initialize global rules flag for all namespaces */
   [ stuff... ]
                   /* evaluate global rules */
   [ etc. ]

And then the rule actually gets checked in eval.c (or something like that)...

case TERM_TYPE_FILESIZE:
                return context->file_size;

... And it's kinda too late to check at this point, the file has already been 
read. (Or at least read as much as you OS Kernel is willing to buffer in 
advance, which is actually quite a lot.)

Original comment by juliavi...@gmail.com on 10 Feb 2013 at 4:05

GoogleCodeExporter commented 9 years ago
YARA scans the file first looking for all the strings of every rule, and only 
after the scanning phase has concluded, it proceeds to evaluate the rule 
conditions. It would be great if YARA were smart enough to realize that in some 
cases the file doesn't need to be scanned at all, but detecting those 
situations is not trivial.

Original comment by plus...@gmail.com on 23 May 2013 at 2:18