mlodic / pdfid

MIT License
13 stars 4 forks source link

Unbound variable when scanning PDF with hex characters #10

Closed AbdelrahmanKhaledAmer closed 1 year ago

AbdelrahmanKhaledAmer commented 1 year ago

If a PDF is given with hex characters (for example obfuscated JS tags like /JavaScript --> /#4AavaScript), the following error is encountered:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 1096, in PDFiDMain
    ProcessFile(filename, options, plugins, list_of_dict["reports"], disarmed_buffers["buffers"])
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 819, in ProcessFile
    PDFID2Dict(xmlDoc, options.nozero, options.force, list_of_dict)
  File "/home/worker/venv/lib/python3.10/site-packages/pdfid/pdfid.py", line 698, in PDFID2Dict
    filename_dict['%s_hexcode_count' % name] = int(node.getAttribute('HexcodeCount'))
NameError: name 'name' is not defined

The bit of code responsible for this is in the function PDFID2Dict here where in line 698 it references a variable name that does not exist within the scope of the function (or anywhere else for that matter): https://github.com/mlodic/pdfid/blob/f7674ff6c0db9e09abbb632719d0ec63b03875db/pdfid/pdfid.py#L683-L720 I cannot provide a fix since I do not know what name is supposed to be in the first place. If anyone can help, that would be much appreciated. :)

mlodic commented 1 year ago

I made a fix and created a release with a fix. Please try it out with your sample. It should work now