verginer / bnb_scrapy_tutorial

A tutorial on how to write a scrapy spider to get data from Airbnb
http://www.verginer.eu/blog/web-scraping-airbnb/
MIT License
29 stars 30 forks source link

how to loop through multiple locations? #8

Closed angelotc closed 7 years ago

angelotc commented 7 years ago

Hi Luca,

Very nice scrapy tutorial!

Just a couple of questions: Is there any way for me to loop through multiple locations so that I can get a file that is more than the 300 results max listings?

Also, how do I add bedrooms/bathrooms into the scraper?

verginer commented 7 years ago

Hi @angelotc sure you can loop through multiple locations either by running the query with a different QUERY parameter on different machines, or you could create a list of locations you are interested in and then add some code to the parse function (i.e. to the top level) such that it will loop through them.

To add more information you need to find it on the page html and write the appropriate xpath expression, which you then add to BnbtutorialItem. I know this is not actual code but to find more data you really need to read the html.

Hope this helps

angelotc commented 7 years ago

thank you for the tip! I am having a hard time scrpaing the bedrooms/bathrooms because the number of bedrooms/bathrooms is dynamic with every listing. This was my shot at it... bathrooms = response.xpath('//*[@id="details"]/div/div/div[4]/div[2]/div/div[1]/div[1]/div[3]/div/strong').extract()[0][27] Any other tips?

verginer commented 7 years ago

Hi @angelotc, yes that is the way to go. As for adding the data to the final output you need to do 2 things:

  1. add in items.py the entry bathrooms = scrapy.Field() which creates a new field in the BnbtutorialItem object
  2. add the value you have extracted in the bnbspider.py like so:
    item['bathrooms'] = xpath_result

One more tip make sure you extract the value and make sure that if the value is not found the spider doesn't crash.

All the best with your project. 👍