py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.31k stars 1.41k forks source link

Alternative to add_transformation translate #1426

Closed felle9900 closed 1 year ago

felle9900 commented 2 years ago

I'm trying to update a older pypdf2 program I've made that can step and repeat several pdf-files on a bigger pdf page.

But the current method seams to work on a bit primitive way:

page_box = reader.pages[0]
page_box.add_transformation(Transformation().rotate(0).translate(tx=50, ty=0))

Keeps translating 50 pt between every pdf im placing. Thats not what Im looking for :)

Any way of doing it the old way with dedicated x1, y,1, x2, y2 values instead? I used to use the mergeRotatedTranslatedPage()

pubpub-zz commented 2 years ago

you should have a look at https://github.com/py-pdf/PyPDF2/issues/558#issuecomment-1138731441

felle9900 commented 2 years ago

Well the #558 just mentions the the code I've already listed. It don't work in a loop

felle9900 commented 2 years ago

Maybe this can explain it a bit better:

# x1, y1, x2, y2 of impositions in Milimeter. Real list have 16 sets of coords.
all_coords = [[47.5, 42.5, 132.5, 97.5], [137.5, 42.5, 222.5, 97.5], [227.5, 42.5, 312.5, 97.5]]

# SRA3 sheet (450 x 320 millimeter)
reader_base = PdfReader("test_files/Blank_sheet_450x320.pdf")
page_base = reader_base.pages[0]

# businesscard to be placed many times on the big sheet.
reader = PdfReader("businesscard.pdf")

for coord in all_coords:
    page_box = reader.pages[0]
    x1 = int(points(coord[0])) # temp set as int to not upset adobe acrobat
    y1 = int(points(coord[1]))
    x2 = int(points(coord[2]))
    y2 = int(points(coord[3]))
    page_box.add_transformation(Transformation().rotate(0).translate(tx=x1, ty=y1))
    page_base.merge_page(page_box)

writer = PdfWriter()
writer.add_page(page_base)
with open("Merge_test.pdf", "wb") as fp:
    writer.write(fp)
felle9900 commented 2 years ago

Ok I solved my problem.

My solution was to keep changing the translate(tx, ty) in each loop.

# First imposition
if i == 0:
    column = points(coord[0]) - media_trim_diff
    row = points(coord[1]) - media_trim_diff

# First imposition in a NEW row
elif i % COLUMNS == 0:
    column = -3 * (TRIM_WIDTH + GAP)
    row = TRIM_HEIGHT + GAP

 # all the rest
else:
    column = TRIM_WIDTH + GAP
    row = points(0)

# page_box.add_transformation(Transformation().rotate(0).translate(tx=column, ty=row))

After that I move the trimbox because that's the only thing that does not get moved with the translate() automatically.

felle9900 commented 2 years ago

For some reason it breaks if i want to use different page numbers to impose.

pubpub-zz commented 2 years ago

just note that the add_transformation will modify page_box so each transformation needs to be relative to previous one

pubpub-zz commented 2 years ago

For some reason it breaks if i want to use different page numbers to impose.

can you please clarify

felle9900 commented 2 years ago

Yes so im placing the same pdf (a businesscard) on a bigger pdf. Everything is fine as long as its the same page of the businesscard im placing. The placement (transmute) is behaving as expecting, and the trimbox is also behaving right.

But if I mix the pages (not the same page of the pdf), the pages are then moved way off like there's something not resetting right.

felle9900 commented 2 years ago
from PyPDF2 import PdfReader, PdfWriter, Transformation

def mm(my_input):
    output = round(my_input / 72 * 25.4, 1)
    return int(output)

def points(my_input):
    output = my_input * 2.83464567
    return output

GAP = points(5)
COLUMNS = 4
ROWS = 4
TRIM_WIDTH = points(85)
TRIM_HEIGHT = points(55)

# x1, y1, x2, y2, scale, page_nr (index_nr)
all_coords = [
                [47.5, 42.5, 132.5, 97.5, 1, 0],
                [137.5, 42.5, 222.5, 97.5, 1, 0],
                [227.5, 42.5, 312.5, 97.5, 1, 0],
                [317.5, 42.5, 402.5, 97.5, 1, 0],
                [47.5, 102.5, 132.5, 157.5, 1, 0],
                [137.5, 102.5, 222.5, 157.5, 1, 0],
                [227.5, 102.5, 312.5, 157.5, 1, 0],
                [317.5, 102.5, 402.5, 157.5, 1, 0],
                [47.5, 162.5, 132.5, 217.5, 1, 0],
                [137.5, 162.5, 222.5, 217.5, 1, 0],
                [227.5, 162.5, 312.5, 217.5, 1, 0],
                [317.5, 162.5, 402.5, 217.5, 1, 0],
                [47.5, 222.5, 132.5, 277.5, 1, 0],
                [137.5, 222.5, 222.5, 277.5, 1, 0],
                [227.5, 222.5, 312.5, 277.5, 1, 0],
                [317.5, 222.5, 402.5, 277.5, 0, 0]
            ]

# big sheet
reader_base = PdfReader("test_files/Blank_sheet_450x320.pdf")
page_base = reader_base.pages[0]

# pdf to impose on the big sheet
reader = PdfReader("test_files/Mobildisko-visitkort.pdf")

# difference between the imposed mediabox and trimbox
media_trim_diff = float((reader.pages[0].mediabox.right - reader.pages[0].trimbox.right))

# trimbox needs to be expanded 2.5 mm on all 4 sides after been moved, so we can se the cropmarks for cutting
trimbox_expanding = int(points(2.5))

for i, coord in enumerate(all_coords):
    page_box = reader.pages[0]

    x1 = points(coord[0])
    y1 = points(coord[1])
    x2 = points(coord[2])
    y2 = points(coord[3])

    # First imposition
    if i == 0:
        column = points(coord[0]) - media_trim_diff
        row = points(coord[1]) - media_trim_diff

    # First imposition in a NEW row
    elif i % COLUMNS == 0:
        column = -3 * (TRIM_WIDTH + GAP)
        row = TRIM_HEIGHT + GAP

    # all the rest
    else:
        column = TRIM_WIDTH + GAP
        row = points(0)

    # move the mediabox and most of the content it is placed correctly, but the viewbox needs to be moved (trimbox)
    page_box.add_transformation(Transformation().rotate(0).translate(tx=column, ty=row))

    # move the trimbox before the expanding
    if GAP == points(0):
        # This is currently not used/working atm
        print("GAP is 0")
        page_box.trimbox.left = x1# - (media_trim_diff / 2)
        page_box.trimbox.bottom = y1# - (media_trim_diff / 2)
        page_box.trimbox.right = x2# - (media_trim_diff / 2)
        page_box.trimbox.top = y2# - (media_trim_diff / 2)

    if GAP == points(5):
        # this is working
        # moving the trimbox
        print("GAP is 5 millimeter")
        page_box.trimbox.left = x1
        page_box.trimbox.bottom = y1
        page_box.trimbox.right = x2
        page_box.trimbox.top = y2

        # expanding the trimbox
        page_box.trimbox.left = float(page_box.trimbox.left - trimbox_expanding)
        page_box.trimbox.bottom = float(page_box.trimbox.bottom - trimbox_expanding)
        page_box.trimbox.right = float(page_box.trimbox.right + trimbox_expanding)
        page_box.trimbox.top = float(page_box.trimbox.top + trimbox_expanding)

    page_base.merge_page(page_box)

# Write the result back
writer = PdfWriter()
writer.add_page(page_base)
with open("Merged_translated_rotated.pdf", "wb") as fp:
    writer.write(fp)
pubpub-zz commented 2 years ago

can you provide your failing please blank page

felle9900 commented 1 year ago

Here is the pdf files I use: Blank_sheet_450x320.pdf Mobildisko-visitkort.pdf

The code I posted above should work and create a 4column, 4 row pdf.

If you change the following code: page_box = reader.pages[0]

to this code:

if i % 2 == 0: # every 2nd loop
    page_box = reader.pages[0]
else:
    page_box = reader.pages[1]

Id should now be messed up. but when you look in outline mode in adobe illustrator you can se it will place the correct pdf pages, but the placement (mediabox) is wrong.

pubpub-zz commented 1 year ago

@MartinThoma / @MasterOdin, Looking at this usecase, reintroducing the mergeTransformedPage (renamed into merge_transformed_page) sounds as the best option. Your opinion ?

MartinThoma commented 1 year ago

Huh, interesting. I don't understand yet why the issue occurs. It sounds like a bug and thus it would be preferable to fix it. But re-introducing the old (working) functions as an intermediate solution would be OK to me.

We would need to document that issue for the new functions though

felle9900 commented 1 year ago

Any news on this problem ?

pubpub-zz commented 1 year ago

lost in the fifo... will come back on it this week-end

felle9900 commented 1 year ago

Still no update?

felle9900 commented 1 year ago

Could we please reintroduce the "mergeRotatedTranslatedPage" class and make it take normal cords and not the tx, ty.

The current functionality breaks when I try to rotate or try mix page numbers. Please I'm stuck with the current classes - It used to work so good before.

MartinThoma commented 1 year ago

It's hard for me to understand the issue as the information is scattered in this thread.

Could you maybe adjust the first comment in this ticket to contain all the information?

A great bug ticket follows this pattern:

1. What I did (as short as possible, but complete - including the full code necessary to re-produce, the PDF used as input, and the versions of the all libraries being used)
2. What I wanted to achieve
3. What happened instead
4. For this issue: The latest version of PyPDF2 that worked as you expected with the same code as mentioned in (1)
5. Really awesome would be a test that fails for the new (broken) code and works with the old code

I'm open to a PR re-introducing mergeTransformedPage with the old way it worked for as long as this issue exists. But I need a way to check if it (still) exists so that we can deprecate it at some point.

pubpub-zz commented 1 year ago

@MartinThoma the PR is in progress should come soone

pubpub-zz commented 1 year ago

@felle9900, you should be able to test the PR here is an code example

import pypdf
r1=pypdf.PdfReader("resources/labeled-edges-center-image.pdf")
w = pypdf.PdfWriter()
r2=pypdf.PdfReader("resources/box.pdf")
w.append(r1)  # to add the page
w.pages[0].merge_transformed_page(r2.pages[0],pypdf.Transformation().scale(2).rotate(45).translate(100,100),False,False)
w.pages[0].merge_transformed_page(r2.pages[0],pypdf.Transformation().scale(2).rotate(45).translate(200,200),False,False)
w.write("output.pdf")

still some clean-up (mypy) and testing to be done

felle9900 commented 1 year ago

I've just upgraded pypdf to version 3.3.0 to test that code. It tells me: AttributeError: 'PageObject' object has no attribute 'merge_transformed_page'. Did you mean: 'mergeTransformedPage'?

Did I miss anything ? ( I used my own pdf files)

pubpub-zz commented 1 year ago

you have to copy the modifed files from the PR

felle9900 commented 1 year ago

Hmm can't se a ez way to download the 9 files. I'm not gonna go thru it manually so I think Ill just wait for it to be implemented. Thanks a lot for the work.

MartinThoma commented 1 year ago

@felle9900 It is implemented in #1567 . We just need somebody to check if it worked as expected.

You can do it like this:

# Go into a clean directory
mkdir issue-1426
cd issue-1426
#... add your script in the directory

# Create and load a virtual environment:
python -m venv venv
source venv/bin/activate

# get the modified code
git clone https://github.com/pubpub-zz/PyPDF2.git
cd PyPDF2
git checkout -b pubpub-zz-merge_trsf_page main
git pull git@github.com:pubpub-zz/PyPDF2.git merge_trsf_page

# Install the modified version
pip install -e .

# Execute your script
felle9900 commented 1 year ago

I tried to follow along but the line: git pull git@github.com:pubpub-zz/PyPDF2.git merge_trsf_page made an error in my terminal ending with:

git@github.com: Permission denied (publickey). fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.

MartinThoma commented 1 year ago

Uh, right, you need the https URL instead of the git one

felle9900 commented 1 year ago

I did the venv to clone the PyPDF2 in the directory (should it not be pypdf btw?).

Now the code don't recognize "pypdf" so i changed them to "PyPDF2", but I then get a error that PyPDF2 does not have a method called "merge_transformed_page"

felle9900 commented 1 year ago

Ok now I got it working - why is the placed pdf pulled in cropped to the trim box? shouldn't it import the whole media box size?

felle9900 commented 1 year ago

Translate 0,0 does not seem to be respected, it places it at bit further in. = maybe it uses the media box coord

felle9900 commented 1 year ago

scaling, rotation and using different pages all works

felle9900 commented 1 year ago

Just opened in illustrator to watch the merged pdf in outline mode: the first pdf i place as translate(0,0) gets placed via the mediabox at 0,0, but because the pdf is cropped visually via the trim box, it looks like its not placed at 0,0.

pubpub-zz commented 1 year ago

the translate(0,0) means no change but be careful about trim as you've noticed and but it can be due to an offset in the origin. Can you provide an example ? PS : I've fixed a few points. You should check out latest commit

felle9900 commented 1 year ago

Ok I just redid the steps and got the new one, looks like the same, I've got to screenshots for you. one is the merged pdf file opened in illustrator. The other screen is in outline mode where you can se its the full original pdf that have been placed but are cropped. placed at x=0, y=0 on the pdf according to the mediabox (the biggest box) pdf_outline pdf_normal

felle9900 commented 1 year ago

it would be nice it the placed pdf was using the whole mediabox or could take a extra arg for cropping. like 7.41 points extra than the trimbox as it is currently, cropping=0 would just be as it is now, using the trimbox

felle9900 commented 1 year ago

There is a problem when I'm placing several impositions (businesscard pdf) on my big sheet-pdf. I'm looping over 20 coords I have in a list and calculate the tx and ty for each placement. The tx and ty are correct, but some weird stuff is happening where its not updating correctly so only the first placement is correct the rest is being placed way off to the left of the sheet-pdf, and the following placements on that row are placed on top of each other. It looks very much like the same error as before.

pubpub-zz commented 1 year ago

I dislike the idea to add extra parameters: The best for me is to adjust/modify the boxes in the source page before inserting.

pubpub-zz commented 1 year ago

I've successfully got this result (requires latest fix): test card: visitcard.pdf

the code

import pypdf

r = PdfReader("visitcard.pdf")
w = pypdf.PdfWriter()
w.add_blank_page(pypdf.PaperSize.A6.width, pypdf.PaperSize.A6.height)
for x in range(4):
    for y in range(7):
        w.pages[0].merge_translated_page(
            r.pages[0],
            x * r.pages[0].trimbox[2],
            y * r.pages[0].trimbox[3],
            True,
            True,
        )
w.write("tt.pdf")

the output tt.pdf

felle9900 commented 1 year ago

Ok this is pretty cool, Could I use 450 x 320 mm instead of "A6" somehow ?

I managed to crop my pdf the way i like (trimbox+5mm) = It displays correctly

I also managed to test that it will place different page numbers from the "visitcard" - tested by using randint(0,1)

But there is a big bug I can't get past: The merge_translated_page() uses the mediabox for the translate, even if I change that before translating.

If you swap out your "visitcard.pdf" with my card Mobildisko-visitkort.pdf tt.pdf

felle9900 commented 1 year ago

At the current state you can't translate less than the mediabox. Maybe that is hardcoded somewhere?

MartinThoma commented 1 year ago

Ok this is pretty cool, Could I use 450 x 320 mm instead of "A6" somehow ?

Not really as all depends on the user_unit of the document. It's typically 1/72 inch which is about 0.352806mm. That means the dimensions you need would be (in default user units): 450/0.352806 ~= 1275 and 320/0.352806 = 907

felle9900 commented 1 year ago

Ok cool I got it, thanks.

What about the translate bug ?

felle9900 commented 1 year ago

Take look on this code, there's some weird stuff going on. Only the first imposition is cropped correctly (bottom left) Rest is placed correctly but the cropping is off.

from pypdf import PdfReader, PdfWriter, Transformation, PaperSize

def mm(my_input):
    output = round(my_input / 72 * 25.4, 1)
    return int(output)

def points(my_input):
    output = my_input * 2.83464567
    return output

GAP = points(5)
COLUMNS = 4
ROWS = 5
TRIM_WIDTH = points(85)
TRIM_HEIGHT = points(55)

all_page_numbers = [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

sheet = PdfReader("test_files/Blank_sheet_450x320.pdf")
imposition = PdfReader("test_files/Mobildisko-visitkort.pdf")

# create write object (sheet)
write_object = PdfWriter()
#write_object.append(sheet)

write_object.add_blank_page(PaperSize.A6.width, PaperSize.A6.height)
#write_object.add_blank_page(points(650), points(320))

# difference between the imposition mediabox and trimbox
media_trim_diff = float((imposition.pages[0].mediabox.right - imposition.pages[0].trimbox.right))

# trimbox needs to be expanded 2.5 mm on all 4 sides after been moved, so we can se the cropmarks for cutting
trimbox_expanding = int(points(2.5))

imposition_index = 0
for x in range(COLUMNS):
    for y in range(ROWS):

        page_nr = all_page_numbers[imposition_index]
        print("imposition_index:", imposition_index, "page_nr", page_nr)

        # expanding the trimbox
        imp_page = imposition.pages[page_nr]
        imp_page.trimbox.left = float(imp_page.trimbox.left - trimbox_expanding)
        imp_page.trimbox.bottom = float(imp_page.trimbox.bottom - trimbox_expanding)
        imp_page.trimbox.right = float(imp_page.trimbox.right + trimbox_expanding)
        imp_page.trimbox.top = float(imp_page.trimbox.top + trimbox_expanding)

        write_object.pages[0].merge_translated_page(
            imp_page,
            x * TRIM_WIDTH + trimbox_expanding,# x * imposition.pages[0].trimbox[2]
            y * TRIM_HEIGHT + trimbox_expanding,# y * imposition.pages[0].trimbox[3]
            True,
            True,
        )
        imposition_index += 1
write_object.write("tt.pdf")
pubpub-zz commented 1 year ago

Ok this is pretty cool, Could I use 450 x 320 mm instead of "A6" somehow ?

Not really as all depends on the user_unit of the document. It's typically 1/72 inch which is about 0.352806mm. That means the dimensions you need would be (in default user units): 450/0.352806 ~= 1275 and 320/0.352806 = 907

In the test code I've produced, I've set the expand to true : the boxes are expanded : I've used A6 to start with but the final size is far much more bigger

pubpub-zz commented 1 year ago

Take look on this code, there's some weird stuff going on. Only the first imposition is cropped correctly (bottom left) Rest is placed correctly but the cropping is off.

I'm confused about your code : Why are you change the trimbox every cycle : you should modify it once and the box is applied.

However reviewing the code I agree that there is something odd (even in the old code) : the cropping is done based on the trim box instead of the crop box (which define the clipping for display and printing) @MartinThoma before commiting the change can you give me your opinion about it ?

MartinThoma commented 1 year ago

I'm sorry, I don't understand the question @pubpub-zz . What do you want to know?

MartinThoma commented 1 year ago

At the current state you can't translate less than the mediabox. Maybe that is hardcoded somewhere? @felle9900 The transformations are not applied to the boxes (mediabox / trimbox / cropbox). That means if you translate the content out of the mediabox, you will no longer see the content.

This behavior is often confusing for people, but I'm uncertain about the best way to improve it. Maybe adding a parameter transform_boxes: bool=False to add_transformation? But what would you expect if a translation is happening?

A method fit_boxes_to_content() might be desirable.

pubpub-zz commented 1 year ago

I'm sorry, I don't understand the question @pubpub-zz . What do you want to know?

Currently merge_transformed_page crops the content to trimbox whereas pdf reference states that the cropping should be done based on cropbox. for me, merge_transformed_page is buggy. Do you confirm my analysis?

pubpub-zz commented 1 year ago

A method fit_boxes_to_content() might be desirable.

this my be very tough to implement...😕

felle9900 commented 1 year ago

The reson im adjusting the trimbox on each cycle is because that can only be done to a specific page on the pdf. That page can be any pagenr at every cycle.

But I was maybe thinking of doing a seperate loop of cropping the businesscard pages so all of the boxes (mediabox/tribox/cropbox) are removed - then the script might work because it just have to place those pages right next to each other. Im gonna go test that out.

felle9900 commented 1 year ago

I did at test by using a cropped file (cropbox = (trimbox + 5 mm)) - file was cropped in Acrobat.

Works like a charm, se the pdf. I remember I did try this a long time ago but I ran into a problem because the old pypdf2 would not respect the cropping of the pdf file it had made itself (wierd).

Im going to test that part now. tt.pdf