scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.09k stars 513 forks source link

HTTP Error 400 (Bad Request) Type: ScriptError -> LUA_ERROR http521 #673

Open syncml opened 7 years ago

syncml commented 7 years ago

Hi

I want to crawl http://www.kuaidaili.com/free/, unfortunately, it throws error message "http521". However, if I crawl a normal URL, such as: http://www.baidu.com, I can get the correct result. Thank you, please help me

splash container: docker run -p 8050:8050 -v /etc/localtime:/etc/localtime -d scrapinghub/splash --disable-private-mode

# coding:utf-8
import requests

request = requests.get('http://localhost:8050/execute', params={
    'lua_source': '''
    function main(splash)
        splash.js_enabled = true
        splash.images_enabled = false
        --splash:autoload("http://apps.bdimg.com/libs/jquery/2.1.4/jquery.min.js")
        splash:set_custom_headers({
            ["Host"] = "www.kuaidaili.com",
            ["Upgrade-Insecure-Requests"] = 1,
            ["User-Agent"] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
        })
        --splash:add_cookie("yd_cookie", "1c46520d-46f7-48987c23d425e5ddea8e61d69ba5e6310cc5")
        --splash:add_cookie("_ydclearance", "8e15ae66af00798be682ad1e-3f38-466f-8e7c-1b34a39558ce-1506142705")
        --splash:add_cookie("channelid", "0")
        --splash:add_cookie("sid", "1506134908429358")
        --splash:add_cookie("_ga", "GA1.2.747342934.1502688188")
        --splash:add_cookie("_gid", "GA1.2.1211502284.1506064471")
        --splash:add_cookie("Hm_lvt_7ed65b1cc4b810e9fd37959c9bb51b31", "1506064470,1506135531")
        --splash:add_cookie("Hm_lpvt_7ed65b1cc4b810e9fd37959c9bb51b31", "1506135551")
        assert(splash:go("http://www.kuaidaili.com"))
        assert(splash:wait(2))
        --splash:evaljs("var h = $(document).height()-$(window).height();$(document).scrollTop(h);")
        --assert(splash:wait(2))
        return splash:html()
    end
    '''
})

print(request.status_code)
print(request.text)

I got this error message:

{
    "description": "Error happened while executing Lua script",
    "type": "ScriptError",
    "error": 400,
    "info": {
        "line_number": 2,
        "type": "LUA_ERROR",
        "source": "[string \"function main(splash, args)\r...\"]",
        "error": "http521",
        "message": "Lua error: [string \"function main(splash, args)\r...\"]:2: http521"
    }
}
expired-brain commented 5 years ago

I got too same type of error. Sometime it loaded correctly sometime got error : 400 Is this for any kinds of block from website?

HTTP Error 400 (Bad Request) Type: ScriptError -> LUA_ERROR Error happened while executing Lua script

Lua error: [string "function main(splash, args) ..."]:2: render_error

{ "info": { "source": "[string \"function main(splash, args)\r...\"]", "line_number": 2, "error": "render_error", "message": "Lua error: [string \"function main(splash, args)\r...\"]:2: render_error", "type": "LUA_ERROR" }, "description": "Error happened while executing Lua script", "error": 400, "type": "ScriptError" }

ramisvik commented 5 years ago

@expired-brain Any luck on this? I'm facing the same error.

manentai commented 5 years ago

same here, I get:

Bad request to Splash: {
'type': 'ScriptError', 
'error': 400, 
'description': 'Error happened while executing Lua script', 
'info': {
    'type': 'LUA_ERROR', 
    'error': 'http400', 
    'source': '[string "function use_crawlera(splash)..."]', 
    'line_number': 52, 
    'message': 'Lua error: [string "function use_crawlera(splash)..."]:52: http400'
    }
}

and my line 52 is: assert(splash:go(splash.args.url))

expired-brain commented 4 years ago

@expired-brain Any luck on this? I'm facing the same error.

yes in my case that was because site uses some js detection. I fixed this by adding some random mouse click on the page & wait on the that page 2-3 second before next request.