scalingexcellence / scrapybook

Scrapy Book Code
http://scrapybook.com/
475 stars 209 forks source link

The index in the xpath doesn't work #55

Closed leon0707 closed 5 years ago

leon0707 commented 5 years ago

In page 37

scrapy shell https://www.gumtree.com/p/commercial-property-to-rent/south-kensington-to-let-serviced-office-space-in-sloane-avenue-sw3-south-kensington/1258815123

>>> response.xpath('//*[@itemprop="price"][1]').extract()
[u'<meta itemprop="price" content="1190.00pm">', u'<meta itemprop="price" content="1750.00pw">', u'<meta itemprop="price" content="346.00pw">', u'<meta itemprop="price" content="50.00pw">', u'<meta itemprop="price" content="625.00pm">', u'<meta itemprop="price" content="250.00pm">', u'<meta itemprop="price" content="300.00pm">', u'<meta itemprop="price" content="400.00pm">', u'<meta itemprop="price" content="500.00pm">', u'<meta itemprop="price" content="190.00pm">', u'<meta itemprop="price" content="502.00pm">']

[1] in the xpath doesn't work, since it returns all <meta itemprop="price"...>. The first one is the price of the property, rest are the prices of similar property.

Copied from chrome: /html/body/div[2]/div/div[3]/main/div[2]/header/span/meta[2]. If I try this xpath, it return empty list.

leon0707 commented 5 years ago

@lookfwd Appreciate the effort you put in this book.

I think the xpath to find the price on a Gumtree is incorrect. The correct one should be response.xpath('(//*[@itemprop="price"])[1]').extract()

//*[@itemprop="price"][1] would return all elements whose itemprop is "price" and which are the first child of their parents.

Explanation: https://stackoverflow.com/questions/3674569/how-to-select-specified-node-within-xpath-node-sets-by-index-with-selenium

scalingexcellence commented 5 years ago

Thanks @leon0707 . Both are correct. I will update them in the next version of the book. Thanks a million!