Open kleuter opened 1 year ago
So for this file the "startxref" value is wrong, as are all of the xref table offsets. More than likely the original file was edited on Windows with a plain text editor (Notepad or similar) which changed the line endings from LF only to CR LF.
Some PDF viewers will attempt to generate their own xref value for files like this, but I have not done so for PDFio due to the chances for errors and the likelihood that such corruption will also damage the binary streams in the file, making it unreadable that way... I will keep this issue open for now but it will not be "fixed" any time soon...
Here's another pdf, newly generated so unlikely to be damaged. https://www.dropbox.com/scl/fi/ecfzyrskea5nhl8phhdsb/eFFF_BE0445890588_202300011.pdf?rlkey=kwx7cb2msd06bonedzdslt6sj&dl=0
Bad xref table header 'xref '.
That file isn't damaged in the same way; in fact, the issue is that there is trailing whitespace after the "xref" keyword but the current parser won't allow it since the PDF specs all say the xref table starts with a line consisting of a single "xref" keyword and doesn't talk about extra whitespace, etc.
So I will update the xref loading code to allow for this but it won't fix the problem with the first file you linked to...
[master b0a66ee] Fix reading of PDF files from Crystal Reports (Issue #45)
If you find other files with issues, please report them as separate issues, otherwise it makes it harder for me to track when a problem is actually fixed... Thanks!
Will do, thanks a lot, Michael. though the fix doesn't seem to work 😢
The pdfiototext tool fails to parse the file: https://www.dropbox.com/scl/fi/1nhivpa3sbjejza8l53rz/NTFS.pdf?rlkey=zvphkczuy71b0vil8zvmrz95v&dl=0
System Information: