scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.04k stars 507 forks source link

Splash 3.5 docker image hangs forever #1179

Open r0oth3x49 opened 1 year ago

r0oth3x49 commented 1 year ago

First of thanks you so much for this great project.

I am facing an issue with latest splash docker image 3.5 as well as 3.4 both. when we try to run it against https://securityfoster.com it never responds back, even timeout is not triggering, i tried the following lua_script (default ones) within the splash.

function main(splash, args)
  local ok, result = splash:with_timeout(function()
    -- try commenting out splash:wait(3)
    -- splash:wait(3)
    assert(splash:go(args.url))
  end, 2)

  if not ok then
    if result == "timeout_over" then
      return "Cannot navigate to the url within 2 seconds"
    else
      return result
    end
  end
  return "Navigated to the url within 2 seconds"
end

and

function main(splash, args)
  assert(splash:go(args.url))
  assert(splash:wait(0.5))
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

with both of the script splash is not responding and hangs, I have to stop and re-run the image. OS: Ubuntu 20.04.6 LTS

andi4567 commented 1 year ago

Well, in my case splash seems to hang at startup. Just tells me that it is listening on port 8050 and nothing else happens.

nyzsirt commented 1 year ago

Same problem with other web pages (alikeskin org) It seems splash:wait(3) parameter brokes somethings. Can you try without wait parameter? It worked for me.

Verbose log with wait parameter.

2023-05-02 11:08:39.034201 [render] [140115188771136] loadFinished: ok 2023-05-02 11:08:39.034307 [render] [140115188771136] loadFinished: disconnecting callback 0 2023-05-02 11:08:39.034536 [render] [140115188771136] [lua_runner] dispatch cmd_id=0 2023-05-02 11:08:39.034613 [render] [140115188771136] [lua_runner] arguments are for command 0, waiting for result of 0 2023-05-02 11:08:39.034699 [render] [140115188771136] [lua_runner] entering dispatch/loop body, args=(PyResult('return', True),) 2023-05-02 11:08:39.034801 [render] [140115188771136] [lua_runner] send PyResult('return', True) 2023-05-02 11:08:39.034902 [render] [140115188771136] [lua_runner] send (lua) (b'return', True) 2023-05-02 11:08:39.035167 [render] [140115188771136] [lua_runner] got AsyncBrowserCommand(id=None, name='wait', kwargs={'time_ms': 500.0, 'callback': '', 'onredirect': False, 'onerror': <function Splash.wait..error at 0x7f6f1c073730>}) 2023-05-02 11:08:39.035338 [render] [140115188771136] [lua_runner] instructions used: 116 2023-05-02 11:08:39.035461 [render] [140115188771136] [lua_runner] executing AsyncBrowserCommand(id=1, name='wait', kwargs={'time_ms': 500.0, 'callback': '', 'onredirect': False, 'onerror': <function Splash.wait..error at 0x7f6f1c073730>}) 2023-05-02 11:08:39.035712 [render] [140115188771136] waiting 500.0ms; timer 140115194400488

nyzsirt commented 1 year ago

It seems there is a general problem with splash wait parameter and web pages use gstatic.com

It can be same issue with this: https://github.com/scrapinghub/splash/issues/1167