mattsse / chromiumoxide

Chrome Devtools Protocol rust API
Apache License 2.0
712 stars 69 forks source link

`page.wait_for_navigation().await?;` seems to return before all the pages assets (images,js,css) are fully loaded #184

Open mxaddict opened 8 months ago

mxaddict commented 8 months ago

page.wait_for_navigation().await?; seems to return before all the pages assets (images,js,css) are fully loaded

I'm trying to load a page that has some images that are added onto the page via js logic after an API request.

I was under the impression that the page.wait_for_navigation().await?; call would wait for these to load, but it seems it does not.

Is there a way to get this to behave the way I expected it to?

mxaddict commented 8 months ago

I have a work around that I've implemented on my end:

Which is to have a event listener logging the timestamp of the last request.

Then in a new thread, I have a loop that checks if the last request is older than timeout.

    let page = Arc::new(browser.new_page("about:blank").await?);
    let last_request = Arc::new(Mutex::new(Instant::now()));
    let xlast_request = last_request.clone();

    let mut request_paused = page.event_listener::<EventRequestPaused>().await.unwrap();
    let xpage = page.clone();
    let interceptor_handle = tokio::spawn(async move {
        while let Some(event) = request_paused.next().await {
            *xlast_request.lock().unwrap() = Instant::now();
            info!(event.request.url);
            if let Err(e) = xpage.execute(ContinueRequestParams::new(event.request_id.clone())).await {
                error!("Failed to continue request: {e}");
            }
        }
    });
pub async fn wait_for_page(last: Arc<Mutex<Instant>>, timeout: Duration) {
    loop {
        tokio::time::sleep(timeout).await;
        if (last.lock().unwrap()).elapsed() > timeout {
            return;
        }
    }
}
shulcsm commented 8 months ago

I guess duplicate of #36

mxaddict commented 8 months ago

I believe so, I did not see that issue before posting 😄

beckend commented 8 months ago

Here is my approach:


    page
      .evaluate(
        r#"() =>
            new Promise((resolve) => {
              if (document.readyState === 'complete') {
                resolve('completed-no-event')
              } else {
                addEventListener('load', () => {
                  resolve('complete-event')
                })
              }
            })
        "#,
      )
      .await?;

This will even enable single page applications to be scraped, so no web pages needs to be server side rendered.