Closed S1lander closed 5 months ago
This looks like using PyMuPDF in a production / commercial environment. Either you or your client should probably own a commercial license - please confirm the license situation with Artifex.
On a technical level, it is hard to understand why PyPDF2 is being used for merging files, vis-a-vis also employing PyMuPDF, which is orders of magnitude faster here.
This looks like using PyMuPDF in a production / commercial environment. Either you or your client should probably own a commercial license - please confirm the license situation with Artifex.
On a technical level, it is hard to understand why PyPDF2 is being used for merging files, vis-a-vis also employing PyMuPDF, which is orders of magnitude faster here.
Thanks for your comments. All good points. I actually just noticed that I misstyped; I am using fitz to merge the files. Do you have any idea how I could get the scaling to be uniform?
Description of the bug
I am currently developing an app for a client. One part of the functionality that I've built in is that it will merge several PDF's together and then add multiple textboxes to all pages (same text, same location). The text always got added correctly to the pages, but recetly the client sent me some new PDF's files to test on, and now the added text has a different scaling on some pages that were pasted in from one of these new files.
When I read the differences of the metadata of the files that have the correct scaling, and those which don't, I'm getting these differences: Difference in /Creator: PDF1 -> AutoCAD 2024 - English 2024 (24.3s (LMS Tech)), PDF2 -> AutoCAD LT 2024 - English 2024 (24.3s (LMS Tech)) Difference in /CreationDate: PDF1 -> D:20240125162128Z, PDF2 -> D:20240201133816Z Difference in /ModDate: PDF1 -> D:20240220144906-06'00', PDF2 -> D:20240201143940-06'00' Difference in /Title: PDF1 -> S-4.1 Framing Sections & Details, PDF2 -> B LFE - Blabla 5-5 - Exp C - S-3.0 Difference in /Producer: PDF1 -> pdfplot16.hdi 16.03.061.00000, PDF2 -> pdfplot16.hdi 16.03.152.00000 Both files have the same page size.
The main difference from what I can tell, is that on one occasion, AutoCAD was used and on the other AutoCAD LT. Obviously I don't want the client having to take that into account when exporting the PDF's. Reportlab does not have this problem, but I'd prefer to use fitz, because reportlab is waaaay slower than fitz. I haven't gone into the code of fitz yet, and thought, maybe someone has knowledge about why it might scale added content differently based on pdf properties.
Would love to get this sorted! I can provide some of the code if needed :)
How to reproduce the bug
.insert_textbox()
(The order doesn't matter. You can also merge and export the file first, and then add the text and it will still have the wrong scaling.)
Problem: The scaling of the content will be different for some of the pages based on which pdf the page originated from. Correct: Wrong:
You can see a basic table structure on all pages, and on some of the pages the added text content is uniformly scaled down. Origin for the scaling seems to be the bottom right corner of the page.
Expected: The added text should have the same scaling on all of the pages. If I use reportlab to then take that merged and exported pdf, and add even more text to all the pages, it does so using the same scaling on all pages. When I do the same thing using fitz again, it will still have the wrong scaling.
PyMuPDF version
1.23.25
Operating system
MacOS
Python version
3.10