pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.74k stars 532 forks source link

Another issue with destorying PDF when inserting html #3886

Open elipongr opened 2 months ago

elipongr commented 2 months ago

We have another issue:

We use it with page.clean_contents(true), as you know making it false causes the issue above. The result is that the whole PDF content is deleted.

White_pdf_IN.pdf White_pdf_OUT.pdf

Originally posted by @elipongr in https://github.com/pymupdf/PyMuPDF/issues/3741#issuecomment-2373626282

JorjMcKie commented 2 months ago

There is need to used page.clean_contents() any longer to ensure correct positions of content insertions.

elipongr commented 2 months ago

Any blog or documentation where you have updated it?

JorjMcKie commented 1 month ago

Any blog or documentation where you have updated it?

We have made this more explicit in the current documentation https://pymupdf.readthedocs.io/en/latest/functions.html#Page.clean_contents.

JorjMcKie commented 2 weeks ago

Otherwise, this is now a duplicate of #4034, for which an upstream bug has been submitted.