miyakogi / pyppeteer

Headless chrome/chromium automation library (unofficial port of puppeteer)
Other
3.56k stars 372 forks source link

How to replace response HTML? #264

Closed JanChec closed 4 years ago

JanChec commented 4 years ago

I'd like to do something like this:

page = await browser.newPage()
await page.setContent(content)
await page.pdf(...)

The problem is that the HTML is replaced, but linked styles, js, images aren't loaded. How to do this so that it will load every asset from tags in content?

Thanks in advance!

niespodd commented 4 years ago

Try adding an artificial sleep(10) and see what happens. If the resources are still not loaded, it might be that Chrome is blocking them. Check console output to figure this one out.

Also, make sure you use a correct HTML markup.

The following code worked for me:

base64_enc_html = "data:text/html;base64,PGh0bWw+CjxoZWFkPgo8bGluayByZWw9InN0eWxlc2hlZXQiIGhyZWY9Imh0dHBzOi8vbWF4Y2RuLmJvb3RzdHJhcGNkbi5jb20vYm9vdHN0cmFwLzQuMC4wL2Nzcy9ib290c3RyYXAubWluLmNzcyIgaW50ZWdyaXR5PSJzaGEzODQtR241Mzg0eHFRMWFvV1hBKzA1OFJYUHhQZzZmeTRJV3ZUTmgwRTI2M1htRmNKbFNBd2lHZ0ZBVy9kQWlTNkpYbSIgY3Jvc3NvcmlnaW49ImFub255bW91cyI+CjwvaGVhZD4KPGJvZHk+CjxkaXYgY2xhc3M9ImNvbnRhaW5lciI+CjxkaXYgY2xhc3M9ImJ0biBidG4tcHJpbWFyeSI+Qm9vdHN0cmFwIGJ1dHRvbjwvZGl2Pgo8L2Rpdj4KPC9ib2R5Pgo8L2h0bWw+"
await pages[0].goto(base64_enc_html)
await pages[0].pdf({'path': 'test.pdf'})
JanChec commented 4 years ago

@niespodd Thanks! Actually the problem was that I had relative paths to statics... After replacing with full ones it works.

But another problem with this method remains - when I use a long data URL it times out at 30s while normally it's working with the same page after like 1s. Short ones work fast. Am I hitting length limit or something? There was 65k chars limit in addresses for some browsers.