scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once. #844

Open mousemin opened 5 years ago

mousemin commented 5 years ago

QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.

mousemin commented 5 years ago

他会导致网站的js会加载不完整

RedTailBullet commented 5 years ago

This issue actually caused Splash to stuck and stops responding. And even ignoring the restart command.

ihor-nahuliak commented 5 years ago

The same issue still happens.

Gallaecio commented 5 years ago

It would be great if one of you managed to provide a set of steps to reliably reproduce this issue. If you do, it would be much easier to fix the issue.

ihor-nahuliak commented 5 years ago

Splash version: 3.2 Qt 5.9.1, PyQt 5.9, WebKit 602.1, sip 4.19.3, Twisted 16.1.1, Lua 5.2 Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609]

I run splash with command docker run -p 8050:8050 scrapinghub/splash

After some count of requests execution I receive a lot of errors in splash log like: loadFinished: RenderErrorInfo(type='Network', code=99, text='Proxy connection refused', url='https://...') I rotate proxy servers on each request using next lua script:

function set_proxy(splash)
    splash:on_request(function(request)
        request:set_proxy{
            host=splash.args.proxy_host,
            port=splash.args.proxy_port,
            username=splash.args.proxy_user,
            password=splash.args.proxy_pass,
        }
        request:set_header('Proxy-Authorization', splash.args.proxy_auth)
    end)
    return 1
end

I also rotate user-agent http header using next lua script:

function set_user_agent(splash)
    splash:on_request(function(request)
        request:set_header('User-Agent', splash.args.user_agent)
    end)
    return 1
end

I understand that it's OK because some of my proxies can expire. But I suppose that it could be related to the main error.

In ~15-20 minutes I start to receive 504 response. But some requests still work. In a pair of minutes I receive just 504 errors. I also tried to run the docker container with next command docker run -p 8050:8050 scrapinghub/splash --max-timeout 3600 regarding to the manual. But the issue still happens.

When 504 error happens in the log I can rarely see the topic error.

There are some errors from the log that I can reproduce now:

Unhandled error in Deferred:
Unhandled Error
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "/app/splash/pool.py", line 47, in _start_render
        pool_d.addBoth(self._close_render, render, slot)
      File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 340, in addBoth
        callbackKeywords=kw, errbackKeywords=kw)
      File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 306, in addCallbacks
        self._runCallbacks()
    --- <exception caught here> ---
      File "/usr/local/lib/python3.5/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "/app/splash/pool.py", line 69, in _close_render
        render.close()
      File "/app/splash/qtrender_lua.py", line 2462, in close
        self.splash.clear()
    builtins.AttributeError: 'LuaRender' object has no attribute 'splash'

error: result is already returned

019-07-10 18:15:43.497252 [network-manager] Traceback (most recent call last):
      File "/app/splash/network_manager.py", line 453, in createRequest
        request = middleware.process(request, render_options, operation, outgoingData)
      File "/app/splash/request_middleware.py", line 26, in process
        allowed_domains = render_options.get_allowed_domains()
      File "/app/splash/render_options.py", line 334, in get_allowed_domains
        allowed_domains = self.get("allowed_domains", default=None)
      File "/app/splash/render_options.py", line 87, in get
        value = self.data.get(name)
    AttributeError: 'RenderOptions' object has no attribute 'data'

internal error in _createRequest middleware
Traceback (most recent call last):
      File "/app/splash/network_manager.py", line 112, in createRequest
        return self._createRequest(operation, request, outgoingData=outgoingData)
      File "/app/splash/network_manager.py", line 142, in _createRequest
        self._handle_custom_proxies(request)
      File "/app/splash/network_manager.py", line 224, in _handle_custom_proxies
        proxy = splash_proxy_factory.queryProxy(proxy_query)[0]
      File "/app/splash/proxy.py", line 38, in queryProxy
        if self.should_use_proxy_list(protocol, url):
      File "/app/splash/proxy.py", line 43, in should_use_proxy_list
        if not self.proxy_list:
    AttributeError: 'ProfilesSplashProxyFactory' object has no attribute 'proxy_list'

Traceback (most recent call last):
  File "/app/splash/network_manager.py", line 305, in _on_reply_error
    self._response_bodies.pop(self._get_request_id(), None)
AttributeError: 'SplashQNetworkAccessManager' object has no attribute '_response_bodies'