moshensky / pdf-visual-diff

Visual Regression Testing for PDFs in JavaScript
MIT License
43 stars 18 forks source link

Masks applied at incorrect positions if doing multiple PDF comparison in a single spec file #71

Closed GrayedFox closed 2 months ago

GrayedFox commented 2 months ago

Background

I started encountering this issue recently - I'm using PlayWright and Mailpit to perform a PDF diff against PDF files sent as email attachments.

Will link relevant parts of the code here for now, can't provide a minimal reproduction base right now as I'm pressed for time but I will edit the issue description to link to a public repo that reproduces the issue in a bit.

Describing the issue now just in case it's easy to spot and fix on your end ⚡

Bug

When performing two diffs of the same PDF file from within a single PlayWright spec the PDF comparison fails for the 2nd diff due to the masks not being applied in the correct location.

// spec
...

        // if I skip this test the 2nd comparison passes
        test.skip('signer sees confirmation email with holiday letting agreement attached', async ({}, testInfo) => {
            await signer.seesEmail({
                subject: `Holiday Letting agreement - ${address}`,
                specFile: testInfo.file,
                pdfs: [pdfs.document],
                masks: masks.document,
                markAsRead: false
            });
        });

        // if I skip this test and don't skip the above test, the 1st one passes
        test('agent sees confirmation email with holiday letting agreement attached', async ({}, testInfo) => {
            await agent.seesEmail({
                subject: `Holiday Letting agreement - ${address}`,
                specFile: testInfo.file,
                pdfs: [pdfs.document],
                masks: masks.document
            });
        });

Here's the relevant part of the class method:

// Actor class
...

        // do a PDF diff for each PDF attachment
        if (pdfs.length > 0) {
            const messageSummary = await getMail(message.ID);
            const messageAttachments = messageSummary.Attachments;

            for (const attachment of messageAttachments) {
                const pdf = pdfs.find(pdf => this.#safeStringCompare(pdf, attachment.FileName));

                // assert that we found a matching pdf based on the filename
                expect(pdf).toBeDefined();
                // assert that the attachment is indeed a PDF file
                expect(attachment.ContentType).toBe('application/pdf');

                // convert the pdf file name to a normalized string for the snapshot path
                const pdfSafeName = this.#safeguardString(pdf);

                // get the PDF buffer and perform a visual diff against our local snapshot
                const buffer = await getMailAttachment(message.ID, attachment.PartID);
                const result = await comparePdfs({ buffer, path: specPath, name: pdfSafeName, masks });

                // expect the comparison to return true
                expect(result).toBe(true);
            }
        }

I know there's a lot of hidden logic here, what I can say with 100% certainty is that the PDF being used in each comparison is pulled from the exact same email - what happens is we send an email with that PDF to two recipients. The spec file simply does two comparisons because sometimes we send slightly different PDFs based on the recipient (which would be two different emails). In this case the agent and signer get the exact same copy.

I get these results:

Base

snapshot-base

Diff

snapshot-diff

New

snapshot-new

Skipping either of those two comparisons causes the other to pass:

passes-if-skipping

Misc

I didn't go to deep into the pdf-diff code. I get the feeling there is something about how the PDF canvas is cleaning up that causes the masks to be applied in the wrong location on the 2nd comparison, I inserted a hard coded 30 second wait before doing the 2nd comparison and got the same result, so it doesn't seem like it's a race condition.

The error might actually be coming from a dependency and not this module itself, will move this report there if that's the case.

Expected Behaviour

Doing multiple PDF comparisons in a single spec file should apply the masks consistently.

moshensky commented 2 months ago

@GrayedFox,

I've tried to add a simple test, but it fails to fail. I understand that the context where and how the test is running is different.

Could you please go ahead and create a repository where the issue is reproduced?

Please note sans the masks the base and new images have different content. For example the new doesn't have the header like thingy from the base.

GrayedFox commented 2 months ago

Hmm this one might be on me, I figured out what was happening. This was my maskRegions custom method:

// options should contain pdf buffer, local pdf path, local pdf name, and optional masks
export const comparePdfs = async (options: { buffer: Buffer; path: string; name: string; masks: RegionMask[] }) => {
    const { buffer, path, name, masks } = options;
    return await comparePdfToSnapshot(buffer, path, name, {
        // this callback is executed per page of the pdf during the diff, if we have matching masks
        // then apply them
        maskRegions: page => {
            const pageMasks: RegionMask[] =
                masks?.filter((mask: RegionMask) => {
                    if (mask.page === page) {
                        // standard PDF PPI is 72 but in PNG the resolution is doubled resulting in
                        // a PPI of 144
                        const pointsToPixles = (point: number) => Math.round(point * 2);
                        mask.y = pointsToPixles(mask.y);
                        mask.x = pointsToPixles(mask.x);
                        mask.height = pointsToPixles(mask.height);
                        mask.width = pointsToPixles(mask.width);
                        console.log(`MASK ${JSON.stringify(mask)}`);
                        return mask;
                    }
                }) || [];
            return pageMasks;
        }
    });
};

The 2nd call to comparePdfs was doubling the mask values again thus the position and area of the masks were being doubled twice. I've fixed it by separating out the conversion logic and ensuring it returns a new array of adjusted values instead of writing to the mask object directly.

This is what I get for making the masks writeable! 🙈