pymupdf / PyMuPDF-Utilities

Demos, examples and utilities using PyMuPDF
GNU Affero General Public License v3.0
566 stars 153 forks source link

Fix typing (TypeError: 'type' object is not subscriptable) #14

Closed mara004 closed 3 years ago

mara004 commented 3 years ago

Also restructured eop a bit to prevent the syntax highlighter of my text editor from crashing

mara004 commented 3 years ago

I just noticed this utility has been moved to pymupdf/fitz/__main__.py, and apparently the issue is already fixed there?

JorjMcKie commented 3 years ago

I just noticed this utility has been moved to pymupdf/fitz/main.py, and apparently the issue is already fixed there?

Yes, thank you for bringing this up anyway. I forgot to test this with pre-3.9 Python in the first place. I will continue to maintain this utility for a while though - even after its features are now provided by the fitz module. Text layouting in general will never be perfect - given the myriad of ways how text can be coded in PDF. So it is useful to have a separate script for adapting to some special circumstances.

mara004 commented 3 years ago

Thanks. I didn't know subscripting of regular types is possible with Python >= 3.9. Layout preserving text extraction of PDFs is very difficult indeed, as only the position of glyphs is encoded but not the document structure, not even space characters. However, it would be really useful to have a working tool for this. (My primary use case is converting scanned books with OCR layer to text files.)