microsoft / playwright

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
https://playwright.dev
Apache License 2.0
66.09k stars 3.61k forks source link

[Feature]: Extract pdf data to text and read the data in playwright #31594

Closed prasathmscss closed 2 months ago

prasathmscss commented 2 months ago

🚀 Feature Request

1) How to Verify text inside pdf file in playwright typescript. 2) Is there any way we can compare two pdf files in playwright.

Example

No response

Motivation

= Extract pdf data to text and read the data in playwright

marcusNumminen commented 2 months ago

I had the same problem that I wanted to compare two PDF files. What I did was that I used 'pdfToPng' lib to convert all the PDF pages to PNG files and then I used Pixelmatch to compare them then I did an expect that the pixelDiff was 0.

What I think would be great if playwright had expects for comparing PDF using the approach I did above (or at least for PNGs) that would in the report generates the same as expect(page).toHaveScreenshot(myScreenshot.png)

mxschmitt commented 2 months ago

Comparing PDFs is outside of the scope for Playwright. We recommend using Network Events to listen for your PDF file, getting the content via response.body() and then using a third-party library to convert it to text in order to make the assertion.

prasathmscss commented 2 months ago

Thanks @marcusNumminen & @mxschmitt