grimesjo / malwarecookbook

Automatically exported from code.google.com/p/malwarecookbook
0 stars 0 forks source link

pescanner.py crashed when analyzing 30K samples #25

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Analyze 30,000 something PE malware

What is the expected output? What do you see instead?

najmi@vostro:~/malware-csm$ ./pescanner.py . > report.txt
Traceback (most recent call last):
  File "./pescanner.py", line 391, in <module>
    pescan.collect()
  File "./pescanner.py", line 323, in collect
    callbacks = self.check_tls(pe)
  File "./pescanner.py", line 161, in check_tls
    func = pe.get_dword_from_data(pe.get_data(callback_array_rva + 4 * idx, 4), 0)
  File "/usr/local/lib/python2.7/dist-packages/pefile-1.2.10_107-py2.7.egg/pefile.py", line 3779, in get_data
    raise PEFormatError, 'data at RVA can\'t be fetched. Corrupt header?'
pefile.PEFormatError: "data at RVA can't be fetched. Corrupt header?"
najmi@vostro:~/malware-csm$

What version of the product are you using? On what operating system?

Version: From malwarecookbook SVN
OS: On Ubuntu 11.04 Nawthy

Please provide any additional information below.
I have to segregate the PE samples into 100-200 files.. which is tedious since 
there are 30,000 samples.

Badly need your help :)

Original issue reported on code.google.com by najmi.zabidi on 18 Jun 2011 at 4:00

GoogleCodeExporter commented 8 years ago
The quick and dirty fix would be to use a try/except block like this:

try:
   func = pe.get_dword_from_data(pe.get_data(callback_array_rva + 4 * idx, 4), 0)
except pefile.PEFormatError:
   break

However, this is like looking the other way, instead of trying to understand 
the problem. The PEFormatError exception is being raised perhaps do to 
accidental corruption or intentional corruption (i.e. packer, etc). 

In the collect() function, you can add:

for file in self.files:
    print 'About to analyze', file

Then when it crashes, look at the last file name printed and you'll know the 
offending sample. Feel free to send it to me (attach it here) or examine the 
TLS portion of the PE header yourself to figure out what exactly is causing the 
problem. 

Original comment by michael.hale@gmail.com on 20 Jun 2011 at 8:18

GoogleCodeExporter commented 8 years ago
Thanks for your reply.

If I use this method to store the analysis:

./pescanner.py . > report.txt

Is there any way I could append on the display which PE that it is analyzing.. 
that is something like it displayed both on screen and report. ..

perhaps you could alter so

./pescanner.py . -o report.txt whilst ..

If possible, my wish you could support saving to SQLite format too ;p

Original comment by najmi.zabidi on 21 Jun 2011 at 1:41

GoogleCodeExporter commented 8 years ago
The SQLite would be a nice touch, but would require a little more time than I 
currently have. There are plenty of other example Python scripts in the book 
you could use to create a db schema though. 

> Is there any way I could append on the display which PE that 
> it is analyzing.. that is something like it displayed both 
> on screen and report. ..

Well the file names are already printed in the report:

============================================================
File:    /home/mhl/testmalware/hallmark.gif.exe
Size:    1200043 bytes

To get the files printed to your terminal, just add the print statement I 
described in my first comment here. Alternately, if you want to simply 
duplicate everything in the report on your screen (i.e. see everything in both 
places), then you have two options. 

Option 1 assumes you run Linux or OSX. In this case, just pipe the output to 
tee (http://linux.about.com/library/cmd/blcmdl1_tee.htm). For example:

$ python pescanner.py malwaredir | tee report.txt

You'll see everything on screen and also have report.txt filled with the info. 

Option 2 assumes you run Windows, where I don't think there is a tee command. 
In this case, you could replace line 364:

print '\n'.join(out)

With something like this:

special_print('\n'.join(out))

And where the other functions are defined, paste something like:

def special_print(s):
    f = open("report.txt", "w")
    if f:
        f.write(s)
        f.close()
    print s

Original comment by michael.hale@gmail.com on 21 Jun 2011 at 9:07

GoogleCodeExporter commented 8 years ago
I'm going to mark this as closed, but will remember about the sqlite3 
integration for a future version of the script. 

Original comment by michael.hale@gmail.com on 27 Jun 2011 at 1:01