Closed inequation closed 3 months ago
That PDF is encrypted. I filed https://github.com/J-F-Liu/lopdf/issues/168 about adding support for RC4 encryption to lopdf.
Thank you! I certainly lack the context knowledge to diagnose this. :)
@inequation Any update on this?
@inequation Any update on this?
How should I know? I'm not the developer, just a user. :) I ended up extracting the text manually, as the formatting appears to confuse converters to the point where output is useless - columns in tables get all mixed up between rows.
FYI, my implementation of RC4 decryption was just merged into lopdf: https://github.com/J-F-Liu/lopdf/pull/228, so this should now be unblocked.
@jrmuizel any chance of picking this up now that lopdf supports decryption? Or, would you accept a pr for this?
Frankly, I have no clue whether the problem lies in pdf-extract, or in one of its dependencies, please redirect me if this issue is misplaced.
For the public AMD GPU ISA documentation, such as: https://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf pdf-extract extracts blank pages. Other extractors, such as PyPDF2, extract the text just fine.