scrapy-plugins / scrapy-splash

Scrapy+Splash for JavaScript integration
BSD 3-Clause "New" or "Revised" License
3.15k stars 450 forks source link

Splash does not return script value in JsonResponse #133

Closed ghost closed 4 years ago

ghost commented 7 years ago

I try to execute JavaScript code and return value, so I added 'script': 1 to splash args, therefore, returned value must store in response.data['script']. But when I try to access 'script' field in response.data, Python raise KeyError. I checked it in debug mode and there is no field 'script' in response.data. I also tried to execute other scripts and even don't use 'js_source' option, but got the same result. Spider:

import scrapy
from scrapy_splash import SplashRequest

class SimpleSpider(scrapy.Spider):
    name = "simple"

    def start_requests(self):
        urls = [
            'https://www.fonbet.ru/#!/live',
        ]
        for url in urls:
            yield SplashRequest(url, self.parse,
                                args={
                                    'script': 1,
                                    'html': 1,
                                    'png': 1,
                                    'render_all': 1,
                                    'width': 1000,
                                    'wait': 3.5,
                                    'js_source': 'document.title="My Title";return document.title;'
                                },
                                endpoint='render.json',  # optional; default is render.html
                                )

    def save_file(self, filename, content, write_flag='w+'):
        with open(filename, write_flag) as file:
            file.write(content)
            self.log('Saved file {}'.format(filename))

    def save_binary(self, filename, content):
        self.save_file(filename, content, 'w+b')

    def save_html(self, filename, content):
        self.save_file(filename, content, 'w+')

    def parse(self, response):
        base_file = response.url.split('.')[1]

        html = response.data['html']
        html_file = base_file + '_page.html'
        self.save_html(html_file, html)

        import base64
        png = base64.b64decode(response.data['png'])
        png_file = base_file + '_page.png'
        self.save_binary(png_file, png)

        script = response.data['script']  # KeyError: 'script'
        script_file = base_file + '_script'
        self.save_file(script_file, script)
ghost commented 7 years ago

I reviewed code of Scrapy-Splash and didn't find any code related to response.data['script'] handling. So I thought that it's probably Splash problem. I tried to do a direct request without Scrapy-Splash and got the same result: response.data does not contain 'script' field Code:

if __name__ == '__main__':
    import requests

    resp = requests.post('http://localhost:8050/render.json',
                         json={
                             'wait': 3.5,
                             'script': 1,
                             'js_source': 'document.title="My Title";return document.title;',
                             'url': 'https://www.fonbet.ru/#!/live',
                         })
    data = resp.json()  # data does not contain 'script' field
    print(data)
Devkalion commented 7 years ago

I've similar problem. I need to execute similar js code, without lua.

Gallaecio commented 5 years ago

I tried to do a direct request without Scrapy-Splash and got the same result

Then please close this issue and report it on https://github.com/scrapinghub/splash