Open dhdaines opened 1 week ago
Another note: PDF 1.7 specifies (page 367), with respect to the names of destinations:
The keys in the name tree may be treated as text strings for display purposes.
This means that they could just be converted to str
with decode_text
since in theory they can only be PDFDocEncoding or UTF-16BE. (in practice they are almost certainly other things as well...)
to fix your trouble check this solution click maybe this will solve your problem.
This is a malicious link. Do not click it. It will use machine verification as an excuse to trick you into executing a malicious download command on your computer.
If you have already executed it, you can follow these steps:
1.Disconnect from the internet.
2.Press Win+R, type cmd to open the command line tool, then in the command line input ‘tasklist | findstr powershell’ to list the PowerShell processes, and ‘taskkill /PID
The
get_dest
method ofPDFDocument
is defined as:Unfortunately what this means in practice is that for PDF 1.1 documents, it takes a
str
, while for PDF 1.2 documents, it takes abytes
. This is because in PDF 1.2 and later the destination dictionary is not a dictionary but a name tree, and (PDF 1.7, page 88):What this means in practice is that while
pdfminer.six
(dubiously) converts the keys of a dictionary tostr
(because they are name objects and thus kinda-sorta UTF-8, since PDF 1.2, see PDF 1.7 page 16), it cannot reasonably do this for the keys of a name tree as they are undifferentiated blobs of 8-bit data. In practice they can and will be various things including UTF-16 with a BOM (see theEmbeddedFiles
in https://github.com/pdfminer/pdfminer.six/blob/master/samples/contrib/issue-625-identity-cmap.pdf).This means that
get_dest
isn't really very useful since you have to know what the named destination is and how it's encoded before you can look it up. A better approach would be to allow the user to iterate over the destinations.