openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.35k stars 2.11k forks source link

pdf2john.pl cannot extract hash from AES 256 protected PDF #5116

Open Aphex1979 opened 2 years ago

Aphex1979 commented 2 years ago

I tried to extract the hash from a strongly encrypted PDF file to get the real password, because the provided password is not working. Therefore i setup Kali Linux yesterday and updated/upgraded it and did a fresh install of JohnTheRipper git clone https://github.com/magnumripper/JohnTheRipper.git

The hash file generated has only the path and name to the pdf:

./pdf2john.pl /home/me/022022.pdf > /home/me/022022.hash
cat /home/me/022022.hash
/home/me/022022.pdf:

When i create an encrypted pdf via $ qpdf --encrypt abc123 abc123 256 -- sample.pdf sample_encrypted.pdf everything works fine.

I did some more investigations of the PDF file with pdfid and pdf-parser

pdfid.py 022022.pdf
PDFiD 0.2.8 022022.pdf
 PDF Header: %PDF-1.5
 obj                   22
 endobj                22
 stream                 3
 endstream              3
 xref                   0
 trailer                0
 startxref              1
 /Page                  1
 /Encrypt               1
 /ObjStm                0
 /JS                    0
 /JavaScript            0
 /AA                    0
 /OpenAction            0
 /AcroForm              0
 /JBIG2Decode           0
 /RichMedia             0
 /Launch                0
 /EmbeddedFile          0
 /XFA                   0
 /URI                   0
 /Colors > 2^24         0
pdf-parser.py -o 22 022022.pdf
obj 22 0
 Type: /XRef
 Referencing: 1 0 R, 3 0 R, 19 0 R
 Contains stream

  <<
    /Length 75
    /Root 1 0 R
    /Info 3 0 R
    /ID [<DDB44FE6BEF1C7D4D6AF9A4109667235> <DDB44FE6BEF1C7D4D6AF9A4109667235>]
    /Encrypt 19 0 R
    /Type /XRef
    /Size 23
    /Index [0 22]
    /W [1 2 0]
    /Filter /FlateDecode
  >>

(this is the only occurence of /Encrypt in this file)

pdf-parser.py -o 19 022022.pdf
obj 19 0
 Type:
 Referencing: 20 0 R

  <<
    /Filter /Standard
    /V 5
    /R 6
    /Length 256
    /P -1036
    /U <E6781751CD886628E361A6B80B14D4278C3B65272F118D05933F27DCDB0047279FFB2545CE9EC93A284F9C9F1F62B884>
    /UE <35E287EBAF0ED0EB9998729B4E114017C6DC6C3EC4F47B23298F5F149619535C>
    /O <D57B595AE21A1EDB5314FFB20EE3632F3B9FBB72E03BACE179976E6FFF758B91673B773566CE63A8EAD403B5EB59837F>
    /OE <C5F52133A6A5A4F19814716D9CAD47B556C59C6EA25CAE0C32DE17565EA1722B>
    /CF 20 0 R
    /StmF /StdCF
    /StrF /StdCF
    /Perms <BC5E73CABE80FF8988B78C41A9D4E076>
  >>

/U /UE /O /OE => a lot of security

The header of the PDF (first 19 lines):

%PDF-1.5
%öäüß
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
3 0 obj
<<
/Subject <17524AC479286EF220FFB65B4985545754F7869D01BA1B9577A02ABE9BD5B041>
/Title <AF0CFAACF58EB3DBCC2823C2B06BC323B22118F9B572813751F719B285F27B25>
/Author <10B3D0BB0317BC9F1269FDA744699E42E49D5E169CE87EF737E2A740C5913D4A>
/Creator <080C9D29EF14ECC9A2D5F3B744115D2FAFEEC230B879B0050A5956803DB2B15F23291C44CCE11BFBA0C8C2BF1C1357E7>
/CreationDate <E6516E162D4CA2587BA28C509A7C658B09036A41DB495EEDFFED13B8D58CE8B05E2020F7F92F6260A0E351F3B9C3D2F0>
/Producer <9570F7B295B6B9C861F1904EA3F8528CB34835F5AD936955E6DAA05F80572CFDD0FAD2A15751CC567B76BEC14451C22B9CCB1D1899A76B4C000638CF1400E803>
>>

Another useful information could be the following output of qpdf:

./qpdf --check /home/me/022022.pdf
WARNING: /home/me/022022.pdf: reported number of objects (23) is not one plus the highest object number (21)
qpdf: /home/me/022022.pdf: invalid password

Conclusion: The PDF version is 1.5. But it is using AES 265 encryption (/V 5 /R 6) according to qpdf-documentation this got introduced with PDF 1.7?

Unfortunately i cannot provide this PDF or recreate another one because it has sensitive data inside (it was created by my employer)

System information:

./john --list=build-info
Version: 1.9.0-jumbo-1+bleeding-4cb8bcaf3 2022-04-21 10:57:53 +0200
Build: linux-gnu 64-bit x86_64 AVX2 AC OMP
SIMD: AVX2, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
CPU tests: AVX2
$JOHN is ./
Format interface version: 14
Max. number of reported tunable costs: 4
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
SINGLE_IDX_MAX: 32768
SINGLE_BUF_MAX: 4294967295
Effective limit: Max. KPC 32768
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 11.2.0
GNU libc version: 2.33 (loaded: 2.33)
Crypto library: OpenSSL
OpenSSL library version: 0101010ef
OpenSSL 1.1.1n  15 Mar 2022
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
times(2) sysconf(_SC_CLK_TCK) is 100
Using times(2) for timers, resolution 10 ms
HR timer: clock_gettime(), latency 800 ns
Total physical host memory: 16318 MiB
Available physical host memory: 9102 MiB
Terminal locale string: en_US.UTF-8
Parsed terminal locale: UTF-8

Should this PDF work with pdf2john.pl or is it planned to support it in the future?

If any other information is needed i'm glad to provide it here.

magnumripper commented 2 years ago

Unfortunately i cannot provide this PDF or recreate another one because it has sensitive data inside (it was created by my employer)

While that is totally understandable it makes it hard for anyone to look into this. Perhaps the file is damaged/truncated?

Aphex1979 commented 2 years ago

While that is totally understandable it makes it hard for anyone to look into this. Perhaps the file is damaged/truncated? I was able to open the PDF file now with the proper password provided by my employer. In the file properties in Acrobat i can see now the application which did create the pdf -> "L2001 PDF-Generator" Created with PDFlib+PDI 9.0.1 (C++/Linux-x86_64). I hope that helps.

magnumripper commented 3 weeks ago

For our formats, PDF 1.5 or 1.7 is less important but the "encryption version" (for lack of a better term) is /V 5 and /R 6. Our formats currently only use the latter (so R 6) for picking algo.

Some stuff picked from above:

    /ID [<DDB44FE6BEF1C7D4D6AF9A4109667235> <DDB44FE6BEF1C7D4D6AF9A4109667235>]

(...)

  <<
    /Filter /Standard
    /V 5
    /R 6
    /Length 256
    /P -1036
    /U <E6781751CD886628E361A6B80B14D4278C3B65272F118D05933F27DCDB0047279FFB2545CE9EC93A284F9C9F1F62B884>
    /UE <35E287EBAF0ED0EB9998729B4E114017C6DC6C3EC4F47B23298F5F149619535C>
    /O <D57B595AE21A1EDB5314FFB20EE3632F3B9FBB72E03BACE179976E6FFF758B91673B773566CE63A8EAD403B5EB59837F>
    /OE <C5F52133A6A5A4F19814716D9CAD47B556C59C6EA25CAE0C32DE17565EA1722B>
    /CF 20 0 R
    /StmF /StdCF
    /StrF /StdCF
    /Perms <BC5E73CABE80FF8988B78C41A9D4E076>
  >>

I think the above is all that's needed to make up a hash that JtR and hashcat understands. Something like this:

$pdf$5*6*256*-1036*0*16*ddb44fe6bef1c7d4d6af9a4109667235*48*e6781751cd886628e361a6b80b14d4278c3b65272f118d05933f27dcdb0047279ffb2545ce9ec93a284f9c9f1f62b884*48*d57b595ae21a1edb5314ffb20ee3632f3b9fbb72e03bace179976e6fff758b91673b773566ce63a8ead403b5eb59837f

AFAICS I only guessed a single thing here: The *0*, which is a flag "encrypt_metadata". It could possibly be the zero in /CF 20 0 R (I didn't try to check the docs for it) or maybe it's found elsewhere. Anyway that is either 1 or 0.

The question though, is why pdf2john.pl couldn't parse it. @Aphex1979 did you ever try the alternative Python version pdf2john.py? I would guess that one is newer than the Perl version.