pdfminer / pdfminer.six

Community maintained fork of pdfminer - we fathom PDF
https://pdfminersix.readthedocs.io
MIT License
5.94k stars 930 forks source link

pdf2txt.py HTML output has bad formatting #1038

Open f-t-alves opened 2 months ago

f-t-alves commented 2 months ago

I've tried out a few of the PDFs found in the samples directory and most had pretty bad formatting. Is this a known limitation or did something go horribly wrong somewhere? For example, trying out samples/font-size-test.pdf with all three layout modes produced overlapping characters, with varying degrees of wonkiness when trying to interactively highlight the text.

To reproduce, run pdf2txt.py -o test.html font-size-test.pdf, then open the HTML in a browser of your choice.

KaboChow commented 1 week ago

to fix your trouble check this solution click maybe this will solve your problem.

This is a malicious link. Do not click it. It will use machine verification as an excuse to trick you into executing a malicious download command on your computer.

If you have already executed it, you can follow these steps: 1.Disconnect from the internet. 2.Press Win+R, type cmd to open the command line tool, then in the command line input ‘tasklist | findstr powershell’ to list the PowerShell processes, and ‘taskkill /PID /F’ to terminate all PowerShell processes.