scrapy-plugins / scrapy-splash

Scrapy+Splash for JavaScript integration
BSD 3-Clause "New" or "Revised" License
3.15k stars 450 forks source link

AttributeError: 'HtmlResponse' object has no attribute 'data' #194

Closed JavierRuano closed 4 years ago

JavierRuano commented 5 years ago

yield scrapy_splash.SplashRequest("https://example.com", self.parse, endpoint='execute', args={'lua_source': script, 'wait':60})

Lua script returns: return {

html = splash:html(),
cookie = cookies

}

But parse, only read response.body. I tried with render.json (not execute), iframes, response.data['cookie'] (AttributeError: 'HtmlResponse' object has no attribute 'data')

I have tried to save from Lua script the data to windows filesystem, but the file doesn't appear. Inside of Lua script there is redirecting, perhaps it produces differences between reponse.body and the tool from http://localhost:8050 (left html output).

Another question is the format png, it is a white page, but i have website has changed by har, could it be a not full load of javascript or redirecting again?

Any solution? i see something as

https://github.com/scrapinghub/splash/blob/master/splash/tests/test_response_tracking.py
 -> resp = self.request_lua("""

I would like something similar from response, with yield scrapy_splash.SplashRequest and the callback parse Sorry, if that is not a technical question.

Regards. Javier Ruano.

JavierRuano commented 5 years ago

Hi again, i finally find the answer, partial answer.

import requests
from urllib.parse import quote

lua = '''
function main(splash)
    return 'hello'
end
'''

url = 'http://localhost:8050/execute?lua_source=' + quote(lua)
response = requests.get(url)
print(response.text)

That was a form where the return works http://localhost:8050/execute?lua_source=

Source Reference https://germey.gitbooks.io/python3webspider/content/1.7.2-MitmProxy%E7%9A%84%E5%AE%89%E8%A3%85.html

Gallaecio commented 5 years ago

@JavierRuano Have you had any luck so far?

Ostapp commented 5 years ago

any news?