laustinbam / opendlp

Automatically exported from code.google.com/p/opendlp
0 stars 0 forks source link

OpenDLP not scanning inside PDF #63

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Put text file with ssn
2. Make pdf with same ssn from text file
3. Run scan

What is the expected output? What do you see instead?
The ssn is found in the text file but not the PDF

What version of the product are you using? On what operating system?
0.4.4

Please provide any additional information below.

The two screenshots show the successful finding of the 'fake ssn' in the text 
document but not in the PDF. I remember previous versions being able to scan 
inside of PDF's

Original issue reported on code.google.com by f8ler...@gmail.com on 7 May 2012 at 7:50

Attachments:

GoogleCodeExporter commented 8 years ago
Can you attach the PDF to this bug report? Thanks.

Original comment by andrew.O...@gmail.com on 8 May 2012 at 12:36

GoogleCodeExporter commented 8 years ago
Attached

Original comment by f8ler...@gmail.com on 9 May 2012 at 8:23

Attachments:

GoogleCodeExporter commented 8 years ago
I opened this file in a text editor and did not see either SSN string. 
Currently, OpenDLP just essentially does a "grep" for information inside files, 
so that's why it can't find it.

I will look into adding better support for scanning PDF files in a future 
release. Thanks for bringing this to my attention.

Original comment by andrew.O...@gmail.com on 12 May 2012 at 9:18

GoogleCodeExporter commented 8 years ago
This would be very helpful for my company as well.  Have you looked at the 
pdftotext utility?

http://stackoverflow.com/questions/139015/how-can-i-do-a-full-text-search-of-pdf
-files-from-perl

Original comment by bri....@gmail.com on 15 May 2012 at 1:41

GoogleCodeExporter commented 8 years ago
I would concur that this functionality would greatly enhance OpenDLP's 
usefulness. A lot of organizations have archival reports in PDF format and 
those PDF files can contain a lot of sensitive data.

Original comment by jbhal...@gmail.com on 16 Aug 2012 at 1:07

GoogleCodeExporter commented 8 years ago
Has there been any additional support for PDF's added

Original comment by tylerlam...@gmail.com on 24 Jul 2014 at 3:10