microlinkhq / browserless

The headless Chrome/Chromium driver on top of Puppeteer.
https://browserless.js.org
MIT License
1.63k stars 81 forks source link

Access to Dynamic Content? #515

Closed paul-bell closed 1 year ago

paul-bell commented 1 year ago

Prerequisites

Question about "dynamic content"

I've just stumbled upon browserless and am only now familiarizing myself with the documentation.

My question is: can browserless (plus Puppeteer) return a web page's "dynamic content,", e.g., data incorporated into the DOM in an onload handler that fetches stuff from the page's backend server? If so, how can I accomplish this?

Thank you!

Kikobeats commented 1 year ago

Yes, you need to configure browserless/puppeteer properly to tell it when the content should be considered ready. Check https://browserless.js.org/#/?id=waitfortimeout 🙂

paul-bell commented 1 year ago

Thank you very much.

In a past life I wrestled with Puppeteer a fair amount, so the terms you pointed me to (domcontentloaded, networkidle0, networkidle2) are familiar.

I will make the required changes; thanks again.

Kikobeats commented 1 year ago

You can play with Microlink API which essentially is wrapping browserless methods.

Here an example: http://api.microlink.io/?url=https://visual-counter.vercel.app&waitForTimeout=3000&screenshot&embed=screenshot.url

paul-bell commented 1 year ago

That, sir, is extremely interesting. I've emailed you via the Microlink 'hello' address about "corporate" use of Microlink.

As to the URL you just posted, are there query parameters that could cause the return of the page's rendered HTML rather than a screenshot?

Thanks very much.

Kikobeats commented 1 year ago

Check this recipe: https://microlink.io/recipes/html 🙂

paul-bell commented 1 year ago

Apologies, my question wasn't clear.

What I meant was something like this:

http://api.microlink.io/?url=http://example.com&waitForTimeout=3000&embed=**html**

I know this URL is wrong, but it conveys what I'm looking for: simply return the rendered page's html content (ideally after any "dynamic" changes to the content), rather than return a screenshot.

Thank you.

Kikobeats commented 1 year ago

You can combine the HTML recipe with embed query parameter:

Thi is the result: https://api.microlink.io/?url=https%3A%2F%2Fexample.com&data.html.selector=html&meta=false&embed=html

paul-bell commented 1 year ago

Ah...lovely.

I will give it a spin.

Thank you so much!

paul-bell commented 1 year ago

That worked wonderfully, thank you.