jesparza / peepdf

Powerful Python tool to analyze PDF documents
http://peepdf.eternal-todo.com
GNU General Public License v3.0
1.32k stars 242 forks source link

some pdfs delay static analysis indefinitely #59

Open mallorybobalice opened 8 years ago

mallorybobalice commented 8 years ago

Hi, from a project using peepdf. https://github.com/spender-sandbox/cuckoo-modified/issues/54 Some samples are here

mallorybobalice commented 8 years ago

Well indefinitely or for say several minutes. 2 3 10 etc.

https://github.com/spender-sandbox/cuckoo-modified/blob/master/modules/processing/static.py

To see how PDFParser is used

jesparza commented 8 years ago

Hi there,

I am struggling to find some time to go through the issues. I hope I can find some time soon. Meanwhile, some comments and suggestions:

Thanks for raising the issue!

Jose

mallorybobalice commented 8 years ago

I might add a traceback print on interrupting static analysis in cuckoo, to help narrow down the function Has anyone had time to look at it?

Size varies seen both small say 100kb, med 1mb and large 5mb pdfs do this , pretty sure saw references to js in old trace backs ?

emulating JS code automatically.

Mmmm, re -m option, could you please have a look above how cuckop static invokes peepdf and suggest how to make it do the equivalent of -m?

jesparza commented 8 years ago

I have taken a quick look with a profiler while analyzing the document with hash 66a35d6660059dd99d4abef26e9d1a3d (A_DOUT_SEC_12_10_2015.pdf). Some comments about that:

Keep in mind that peepdf was designed to analyze malicious files, so normally small documents. This document is a 2MB-document. Not sure if with the current design (using re.findall) the time can be better than this. Maybe some things to check:

I hope this helps!

jesparza commented 7 years ago

Brad fixed the problem described in the first point at the end of the previous comment:

https://github.com/spender-sandbox/cuckoo-modified/commit/1d3bdee75006f792cf903baa24547cf2692f807c

After testing this and being sure that the modification is not missing anything we can include the code to fix the issue.

spender-sandbox commented 7 years ago

Will need to apply this as well: https://github.com/spender-sandbox/cuckoo-modified/commit/7e2f5211accf2c0600b8e193d4bd1eb915e53270

Nwinternights commented 7 years ago

@jesparza can you please apply that fixes? regards