phimage / Erik

Erik is an headless browser based on WebKit. An headless browser allow to run functional tests, to access and manipulate webpages using javascript.
http://phimage.github.io/Erik/
MIT License
594 stars 47 forks source link

[Help Needed] wait for site finish loading #61

Open huynguyen230892 opened 1 year ago

huynguyen230892 commented 1 year ago

Hello, I'm really sorry to bother you, but how can I get the contents of a website after it has finished loading? I've searched everywhere and found nothing. Please help me. :(

phimage commented 1 year ago

not enough information, will close

Le mer. 28 juin 2023 à 08:29, huynguyen230892 @.***> a écrit :

Hello, I am really sorry to bother you, but how can I get the contents when the website is finish loading? I search everywhere and find nothing. Please help me :(

— Reply to this email directly, view it on GitHub https://github.com/phimage/Erik/issues/61, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDW56AWH6GDIIL72W2ZL23XNPFN7ANCNFSM6AAAAAAZWR2RH4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

robertlude commented 10 months ago

I, too, would like this functionality.

In other headless browsers I have used, there is an option to wait for the page to completely load before scraping or manipulating, including the page's JavaScript updating more "dynamic" pages before querying for information.

A proper signal that says "the page is actually fully loaded and ready to parse" would be really helpful. A lot of web sites use code to immediately load in data, but it's not immediate.

I think other headless browsers do it by detecting when DOM changes really stop after X threshold of time, or maybe detecting that no immediate JavaScript is running, I'm not really sure. But this would be a fantastic feature.

e-marchand commented 10 months ago

I, too, would like this functionality.

In other headless browsers I have used, there is an option to wait for the page to completely load before scraping or manipulating, including the page's JavaScript updating more "dynamic" pages before querying for information.

A proper signal that says "the page is actually fully loaded and ready to parse" would be really helpful. A lot of web sites use code to immediately load in data, but it's not immediate.

I think other headless browsers do it by detecting when DOM changes really stop after X threshold of time, or maybe detecting that no immediate JavaScript is running, I'm not really sure. But this would be a fantastic feature.

Could be implemented here if api allow it https://github.com/phimage/Erik/blob/92c7081ce6f5b4416cc238791e14960bc44fdbf7/Sources/LayoutEngine.swift#L65C17-L65C34

my way to do it is after loading wait for a specific dom element created, or some js variable intialized, and if not I want and retry to get it So it's depend on each page. I do it a lot in browser "userscript"

tkantor81 commented 5 months ago

Hello, I am also receiving different responses, because page have JS and it is not fully loaded. So could you give me a clue how to use this library? How to wait for page is loaded? Thank you