pmaupin / pdfrw

pdfrw is a pure Python library that reads and writes PDFs
Other
1.87k stars 274 forks source link

Cannot get info about encrypted file #123

Open Lahorde opened 6 years ago

Lahorde commented 6 years ago

I tried to get pdf info from file given as attachment STM32HitexRefManual.pdf

Here is info returned :

{'/CreationDate': '(ÅÔh\x0cw\x1dx¡\x01dXe\x91\x80\x8cÖí`Óän\x8eâÕÜèLºALj\x8e\x8bGôáæM\x99+$\x02\x8bYÞ\x89ð0)', '/Author': '(±\x1boézðdY\x94}¯\x894\x10Ù±å\x91\x13Òº±\x8eÚ<ìÕù\x94ÜÈy)', '/Creator': '(Îü\x84Wrbz\x9
0´,y¥þy\x89eÀG¥Øý\x15Z\x833Í\t,\x90\\)Ð<6\x97\x92 8\x98\x80|¼ZdbÀ\x9fa1)', '/Producer': "(÷ï\x07\\\\fF\x15Ã\x07o\x02ïTú¯ß¬\x12Q\x07?ÓJONå«ÖÜ«\x85Ö8Û³É\x8d'FËS\x0f>\x86\x075r\x93\x15¢ÿÑ-ªÑÌC¾i?\x84ÌU@)", '/ModDat
e': "(Ï¡\x10\x85\\)ÝeQ\x9f\x86\tHýAdʾŷ¶®¨õT\tÔ\x8aLc\x94Ã,©`\x10\x175ÉLr\x97'\x9bÎft\x84;)", '/Title': '(·#ݨS½ù7\x19\x88\x03\x85°Ün\x04Q\x7f\x1a=þDÏ\x14àò+H\x1e\x83ù\x89ËWr¿c\x95¸\x15\x03\x03»G<ær#U\x9cé$¶¡\x
170\x0b\x94\x13=y§\x00)'}

Info is encrypted, from pdfrw doc, I read :

The examples directory has a few scripts which use the library. Note that if these examples do not work with your PDF, you should try to use pdftk to uncompress and/or unencrypt them first.

Would it be possible to handle encrypted files as pdfinfo does?

pdfinfo STM32HitexRefManual.pdf 
Title:          Microsoft Word - ISGSTM32-v18d-fu.doc.docx
Author:         Fuller
Creator:        PScript5.dll Version 5.2.2
Producer:       Acrobat Distiller 7.0.5 (Windows)
CreationDate:   Thu Oct 22 15:48:02 2009 CEST
ModDate:        Fri Oct 23 13:25:51 2009 CEST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          106
Encrypted:      yes (print:no copy:no change:no addNotes:no algorithm:AES)
Page size:      595.276 x 841.89 pts (A4)
Page rot:       0
File size:      4396418 bytes
Optimized:      yes
PDF version:    1.6
JorjMcKie commented 6 years ago

When encrypting a PDF, there is a choice as to whether also encrypt metadata or not. If metadata is encrypted, too, then you must decrypt before you can access anything.

Gesendet von Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 für Windows 10


Von: Lahorde notifications@github.com Gesendet: Saturday, December 30, 2017 9:24:16 AM An: pmaupin/pdfrw Cc: Subscribed Betreff: [pmaupin/pdfrw] Cannot get info about encrypted file (#123)

I tried to get pdf info from file given as attachment STM32HitexRefManual.pdfhttps://github.com/pmaupin/pdfrw/files/1594407/STM32HitexRefManual.pdf

Here is info returned :

{'/CreationDate': '(ÅÔh\x0cw\x1dx¡\x01dXe\x91\x80\x8cÖíÓän\x8eâÕÜèLºALj\x8e\x8bGôáæM\x99+$\x02\x8bYÞ\x89ð0)', '/Author': '(±\x1boézðdY\x94}¯\x894\x10Ù±å\x91\x13Òº±\x8eÚ<ìÕù\x94ÜÈy)', '/Creator': '(Îü\x84Wrbz\x9 0´,y¥þy\x89eÀG¥Øý\x15Z\x833Í\t,\x90\\)Ð<6\x97\x92 8\x98\x80|¼ZdbÀ\x9fa1)', '/Producer': "(÷ï\x07\\\\fF\x15Ã\x07o\x02ïTú¯ß¬\x12Q\x07?ÓJONå«ÖÜ«\x85Ö8Û³É\x8d'FËS\x0f>\x86\x075r\x93\x15¢ÿÑ-ªÑÌC¾i?\x84ÌU@)", '/ModDat e': "(Ï¡\x10\x85\\)ÝeQ\x9f\x86\tHýAdʾŷ¶®¨õT\tÔ\x8aLc\x94Ã,©\x10\x175ÉLr\x97'\x9bÎft\x84;)", '/Title': '(·#ݨS½ù7\x19\x88\x03\x85°Ün\x04Q\x7f\x1a=þDÏ\x14àò+H\x1e\x83ù\x89ËWr¿c\x95¸\x15\x03\x03»G<ær#U\x9cé$¶¡\x 170\x0b\x94\x13=y¬ß\x00)'}

Info is encrypted, from pdfrw doc, I read :

The examples directory has a few scripts which use the library. Note that if these examples do not work with your PDF, you should try to use pdftk to uncompress and/or unencrypt them first.

Would it be possible to handle encrypted files as pdfinfo does?

pdfinfo STM32HitexRefManual.pdf Title: Microsoft Word - ISGSTM32-v18d-fu.doc.docx Author: Fuller Creator: PScript5.dll Version 5.2.2 Producer: Acrobat Distiller 7.0.5 (Windows) CreationDate: Thu Oct 22 15:48:02 2009 CEST ModDate: Fri Oct 23 13:25:51 2009 CEST Tagged: no UserProperties: no Suspects: no Form: none JavaScript: no Pages: 106 Encrypted: yes (print:no copy:no change:no addNotes:no algorithm:AES) Page size: 595.276 x 841.89 pts (A4) Page rot: 0 File size: 4396418 bytes Optimized: yes PDF version: 1.6

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/pmaupin/pdfrw/issues/123, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH6BoiQGb-QsR2v5KeqkY8OGeXL0tmIyks5tFjmAgaJpZM4RPmST.

Lahorde commented 6 years ago

To get PDF title from python with encrypted files, I called pdfinfo from python code, code is here