Closed simonschoe closed 2 months ago
I think you can do this with text_blocks = page.get_text("blocks")
, see: https://pymupdf.readthedocs.io/en/latest/page.html#Page.get_text
I fully agree with @jamie-lemon 's comment. Otherwise: this is no issue, but rather a Discussions item. Let's not bloat the Issues with sheer questions!
@JorjMcKie Hi there, any chance that it will be possible in the future to obtain bounding boxes for the extracted text elements? That way it would be possible to map the extracted text back onto the original PDF-page, for example, to visualize the chunk. This would be super helpful for endusers. :)