scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

Question: how to wait for page loads #606

Closed jambonnade closed 4 years ago

jambonnade commented 7 years ago

Hi,

First, i'd like to know exactly when the splash:go() call returns :

Then how should we deal with scripts going through multiple pages without additional go() calls ? (ex : click on links, form submits) Using wait() with cancel_on_redirect flag is a good start but again i don't know when wait() returns exactly in this case.

I don't find it's a serious way to add wait() calls with random timings to let the page finish loading, so if there is no designed way for this, i may do something like : check at some interval that the page has a specific html element or if there is a javascript variable indicating that the new page is loaded

Thanks

ghost commented 7 years ago

@jambonnade You may refer to this sample wait for a specific element to load script.

function wait_for_element(splash, css, maxwait) -- Wait until a selector matches an element -- in the page. Return an error if waited more -- than maxwait seconds. if maxwait == nil then maxwait = 10 end return splash:wait_for_resume(string.format([[ function main(splash) { var selector = '%s'; var maxwait = %s; var end = Date.now() + maxwait*1000;

  function check() {
    if(document.querySelector(selector)) {
      splash.resume('Element found');
    } else if(Date.now() >= end) {
      var err = 'Timeout waiting for element';
      splash.error(err + " " + selector);
    } else {
      setTimeout(check, 200);
    }
  }
  check();
}

]], css, maxwait)) end

function main(splash) splash:go("http://scrapinghub.com") wait_for_element(splash, "#foo") return {png=splash:png()} end

dalepo commented 6 years ago

The above example works but it has a problem. If the page reloads, it interrupts the script execution. So I wrote a function purely in lua to handle this kind of problem.

function wait_for_element(splash, css, maxwait)
    if maxwait == nil then
        maxwait = 10
    end
    local exit = false
    local time_chunk = 0.2
    local time_passed = 0
    while (exit == false)
    do
        local element = splash:select(css)
        if element then
            exit = true
        elseif time_passed >= maxwait then
            exit = true
            error('Timed out waiting for -' .. css)
        else
            splash:wait(time_chunk)
            time_passed = time_passed + time_chunk
        end
    end
end
dalepo commented 6 years ago

Apparently the above function has high cpu usage, not sure why. I don't recommend using it