Closed nianfouyi closed 2 years ago
Hi @luozhongxiangsi, and thanks for your interest in this library. A “paragraph” is not a concept defined by the PDF specification, and paragraphs are visually represented in different ways in different PDFS, so there’s no consistent/reliable way to identify them. However, you may be able to achieve some of your goals using page.extract_text(layout=True, …)
. (See the documentation for .extract_text
in the README for more details.)
I want to read paragraph content, but I can't find any way, is there no such way?