Closed mara004 closed 3 years ago
I just noticed this utility has been moved to pymupdf/fitz/__main__.py
, and apparently the issue is already fixed there?
I just noticed this utility has been moved to pymupdf/fitz/main.py, and apparently the issue is already fixed there?
Yes, thank you for bringing this up anyway. I forgot to test this with pre-3.9 Python in the first place.
I will continue to maintain this utility for a while though - even after its features are now provided by the fitz
module.
Text layouting in general will never be perfect - given the myriad of ways how text can be coded in PDF. So it is useful to have a separate script for adapting to some special circumstances.
Thanks. I didn't know subscripting of regular types is possible with Python >= 3.9. Layout preserving text extraction of PDFs is very difficult indeed, as only the position of glyphs is encoded but not the document structure, not even space characters. However, it would be really useful to have a working tool for this. (My primary use case is converting scanned books with OCR layer to text files.)
Also restructured
eop
a bit to prevent the syntax highlighter of my text editor from crashing