Closed papipsycho closed 5 months ago
There is a new parameter margins=(0, 50, 0, 50)
supported by the method.
Its default assumes a top and bottom page border of 50 points each.
If you use margins=0
, you should get the previous behavior.
I've test, but unfortunatly, is the same issue.
i will try to create test pdf for you
Could not reproduce the problem. Here is an example demonstrating successful use. PDF look like this:
Default extraction output (i.e. margins=(0, 50, 0, 50)
) omits the page header:
data = pymupdf4llm.to_markdown("v110-changes.pdf")
print(data[:500])
# Pixmap
The alpha channel is now optional. Its presence is controlled by a new boolean parameter (called `alpha` ). This
has the following consequences:
Setting margins to 0 delivers the full page:
data = pymupdf4llm.to_markdown("v110-changes.pdf", margins=0)
print(data[:500])
**MuPDF v1.10 Changes and their Implications for PyMuPDF**
# Pixmap
The alpha channel is now optional. Its presence is controlled by a new boolean parameter (called `alpha` ). This
has the following consequences:
closed for lack of response over an extended period of time
I was wondering why my code wasn't functioning as expected anymore. I had test pdfs that contain text and images. All of a sudden it would not get text anymore. margins=0 fixed it. I understand that this is a very early package, but maybe semver or better documentation on breaking changes could be a good idea 👍 Test PDF: hey_image(1).pdf
Hello,
we remark some issue the heading of the pdf was correct on the version 0.0.3 since the version 0.0.5 didn't have any heading anymore.