Open lopuhin opened 4 years ago
403 with description in the body seems clearer.
Note that on some pages, we get network301 instead of a successful render, e.g. for https://www.accc.gov.au/media-release/advertising-agents-warned-of-risks-of-breaching-trade-practices-act
echo "advertising" >> test_filters/test-filters.txt
# with filters
curl 'http://localhost:8052/execute?url=https%3A%2F%2Fwww.accc.gov.au%2Fmedia-release%2Fadvertising-agents-warned-of-risks-of-breaching-trade-practices-act&lua_source=function+main(splash%2C+args)%0D%0A++assert(splash%3Ago(args.url))%0D%0A++return+1%0D%0Aend&filters=test-filters'
{"error": 400, "type": "ScriptError", "description": "Error happened while executing Lua script", "info": {"source": "[string \"function main(splash, args)\r...\"]", "line_number": 2, "error": "network301", "type": "LUA_ERROR", "message": "Lua error: [string \"function main(splash, args)\r...\"]:2: network301"}}
# without filters
curl 'http://localhost:8052/execute?url=https%3A%2F%2Fwww.accc.gov.au%2Fmedia-release%2Fadvertising-agents-warned-of-risks-of-breaching-trade-practices-act&lua_source=function+main(splash%2C+args)%0D%0A++assert(splash%3Ago(ars.url))%0D%0A++return+1%0D%0Aend'
1
this pages gives a 304 (cached) on second render, but this error happens almost immediately regardless of the order of requests and even as the first request after restart.
Here are the logs:
2020-04-28 15:15:15.099973 [render] [139617798218360] viewport size is set to 1024x768
2020-04-28 15:15:15.100052 [pool] [139617798218360] SLOT 0 is starting
2020-04-28 15:15:15.100089 [render] [139617798218360] function main(splash, args)\r\n assert(splash:go(args.url))\r\n return 1\r\nend
2020-04-28 15:15:15.102196 [render] [139617798218360] [lua_runner] dispatch cmd_id=__START__
2020-04-28 15:15:15.102238 [render] [139617798218360] [lua_runner] arguments are for command __START__, waiting for result of __START__
2020-04-28 15:15:15.102270 [render] [139617798218360] [lua_runner] entering dispatch/loop body, args=()
2020-04-28 15:15:15.102297 [render] [139617798218360] [lua_runner] send None
2020-04-28 15:15:15.102325 [render] [139617798218360] [lua_runner] send (lua) None
2020-04-28 15:15:15.102414 [render] [139617798218360] [lua_runner] got AsyncBrowserCommand(id=None, name='go', kwargs={'url': 'https://www.accc.gov.au/media-release/advertising-agents-warned-of-risks-of-breaching-trade-practices-act', 'baseurl': None, 'callback': '
<a callback>', 'errback': '<an errback>', 'http_method': 'GET', 'body': None, 'headers': None})
2020-04-28 15:15:15.102460 [render] [139617798218360] [lua_runner] instructions used: 70
2020-04-28 15:15:15.102494 [render] [139617798218360] [lua_runner] executing AsyncBrowserCommand(id=0, name='go', kwargs={'url': 'https://www.accc.gov.au/media-release/advertising-agents-warned-of-risks-of-breaching-trade-practices-act', 'baseurl': None, 'callback'
: '<a callback>', 'errback': '<an errback>', 'http_method': 'GET', 'body': None, 'headers': None})
2020-04-28 15:15:15.102526 [render] [139617798218360] HAR event: _onStarted
2020-04-28 15:15:15.102588 [render] [139617798218360] callback 0 is connected to loadFinished
2020-04-28 15:15:15.103311 [network] [139617798218360] GET https://www.accc.gov.au/media-release/advertising-agents-warned-of-risks-of-breaching-trade-practices-act
2020-04-28 15:15:15.103461 [request_middleware] Filter test-filters: dropped 139617798218360 GET https://www.accc.gov.au/media-release/advertising-agents-warned-of-risks-of-breaching-trade-practices-act
2020-04-28 15:15:15.104235 [pool] [139617798218360] SLOT 0 is working
2020-04-28 15:15:15.104282 [pool] [139617798218360] queued
2020-04-28 15:15:15.104373 [QAbstractEventDispatcher] awake; block time: 0.0209
2020-04-28 15:15:15.104403 [QAbstractEventDispatcher] aboutToBlock
2020-04-28 15:15:15.104553 [-] ErrorPageExtension in WebkitWebPage.extension
2020-04-28 15:15:15.108445 [render] [139617798218360] loadFinished: unknown error
2020-04-28 15:15:15.108495 [render] [139617798218360] loadFinished: disconnecting callback 0
2020-04-28 15:15:15.108541 [render] [139617798218360] [lua_runner] dispatch cmd_id=0
2020-04-28 15:15:15.108569 [render] [139617798218360] [lua_runner] arguments are for command 0, waiting for result of 0
2020-04-28 15:15:15.108602 [render] [139617798218360] [lua_runner] entering dispatch/loop body, args=(PyResult('return', None, 'network301'),)
2020-04-28 15:15:15.108632 [render] [139617798218360] [lua_runner] send PyResult('return', None, 'network301')
2020-04-28 15:15:15.108667 [render] [139617798218360] [lua_runner] send (lua) (b'return', None, b'network301')
2020-04-28 15:15:15.108724 [render] [139617798218360] [lua_runner] instructions used: 79
2020-04-28 15:15:15.108756 [render] [139617798218360] [lua_runner] caught LuaError LuaError('[string "function main(splash, args)\\r..."]:2: network301',)
2020-04-28 15:15:15.108852 [pool] [139617798218360] SLOT 0 finished with an error <splash.qtrender_lua.LuaRender object at 0x7efb4c407470>: [Failure instance: Traceback: <class 'splash.errors.ScriptError'>: {'source': '[string "function main(splash, args)\r..."]',
'line_number': 2, 'error': 'network301', 'type': 'LUA_ERROR', 'message': 'Lua error: [string "function main(splash, args)\r..."]:2: network301'}
/app/splash/engines/webkit/browser_tab.py:501:_on_content_ready
/app/splash/qtrender_lua.py:714:error
/app/splash/lua_runner.py:27:return_result
/app/splash/render_scripts.py:21:stop_on_error_wrapper
--- <exception caught here> ---
/app/splash/render_scripts.py:19:stop_on_error_wrapper
/app/splash/qtrender_lua.py:2343:dispatch
/app/splash/lua_runner.py:195:dispatch
]
The docs at https://splash.readthedocs.io/en/stable/api.html#request-filters say
But it seems that they are applied to the main request as well, at least it appears to be so.
Consider this lua script:
and start splash with filters:
then make a request to
http://books.toscrape.com/foo
without filters:this returned 404 as expected. Now with filters:
this returned network301 instead of http404.
For a page which gives a 200 we have a successful response even with filters enabled:
If the URL does not match the filters we get 404 as we should: