Could anybody please help understand why Splash would not render this https://www.flipkart.com/apple-iphone-6s-space-grey-64-gb/p/itmebysg5kgxugfk?pid=MOBEBY3VTD7ZHZQA at all?

Any help will be greatly appreciated.

Below is our sample spider. All I am trying to do is crawl a product url and print its product title.

Note: The issue is there with all the product pages from this site and not in particular with the above mentioned page. However, Splash is able to render the full html of the below category page from the same site.

Category page link [which splash is able to render fine] - https://www.flipkart.com/mobiles/apple~brand/pr?sid=tyy,4io&otracker=product_breadCrumbs_Apple+Mobiles

Spider Code:

import scrapy import re import os.path

from flip.items import AppleItem

from apple.commonfunctions import format_review_count

from scrapy.linkextractors import LinkExtractor from scrapy.linkextractors.sgml import SgmlLinkExtractor from scrapy.spiders import CrawlSpider, Rule from urlparse import urljoin from scrapy.loader import ItemLoader from scrapy_splash import SplashRequest

class FlipSpider(CrawlSpider): name = "flip" allowed_domains = [] start_urls = [ "https://www.flipkart.com/apple-iphone-6s-space-grey-64-gb/p/itmebysg5kgxugfk?pid=MOBEBY3VTD7ZHZQA" ]

def init(self, timeStamp='', outputFolder='', _args, *_kwargs):

super(FlipSpider, self).__init__(*args, **kwargs)

def start_requests(self): for url in self.start_urls:

yield SplashRequest(url, self.parse_start_url, endpoint = 'render.html', args = {'wait': 0.5} )

    yield scrapy.Request(url, self.parse_start_url, meta={'splash':{'endpoint':'render.html','args':{'wait': 0.5,}}})

def parse_start_url(self, response): print "inside parse_detail_page" print "Product Title = " + response.xpath('//h1[contains(@class,"_3eAQiD")]/text()').extract()

Splash related settings that I have in my settings.py is below.

DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, }

SPIDER_MIDDLEWARES = { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100, }

SPLASH_URL = 'http://localhost:8050/'

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter' HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

scrapy-plugins / scrapy-splash

Splash not rendering this JavaScript page #84

Spider Code:

from flip.items import AppleItem

from apple.commonfunctions import format_review_count

yield SplashRequest(url, self.parse_start_url, endpoint = 'render.html', args = {'wait': 0.5} )

Splash related settings that I have in my settings.py is below.