verginer / bnb_scrapy_tutorial

A tutorial on how to write a scrapy spider to get data from Airbnb
http://www.verginer.eu/blog/web-scraping-airbnb/
MIT License
29 stars 30 forks source link

Scrapy is not scraping json items #9

Open data1111 opened 6 years ago

data1111 commented 6 years ago

Hi there,

First of all, thanks for developing this code.

I'm having trouble with scrapy and the json items. I got it to scrape the pages I wanted and when I open the csv file it only comes with the urls, not the other items... What do you sugest?

Cheers

Pant76 commented 6 years ago

hi, same problem here!

ikedaandre commented 6 years ago

It appears that AirBnb no longer sends a JSON with the necessary information. In order to make it work now you will have to update the locator to get the information from the HTML (using XPATH or CSS selectors). Also you will have to use Splash since some of the elements are not loaded if requested by Scrapy only.

griffadamus commented 6 years ago

I'm having this same issue. I have no idea how to implement the update that idedaandre suggests. Any help would be awesome.

evagian commented 6 years ago

me too... this is the output

instant_book,satisfaction_guest,rating_checkin,bed_type,person_capacity,accuracy_rating,rating_communication,room_type,hosting_id,url,amenities,rev_count,cancel_policy,rating_cleanliness,nightly_price,host_id,response_rate,price,response_time ,,,,,,,,,https://www.airbnb.com/rooms/993348?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/661755?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/2107937?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/659712?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/17428493?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/15064259?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/10983314?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/3455118?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/526402?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/2610077?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/283638?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/5283277?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/2670085?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/14349663?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/7027819?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/12783254?location=greece,,,,,,,,, ,,,,,,,,,https://www.airbnb.com/rooms/1192594?location=greece,,,,,,,,,

ikedaandre commented 6 years ago

What I would recommend is using Splash + Scrappy (if you google splash with scrappy there should be enough documentation on how to set it up properly). After you setup, splash+scrappy then use CSS selectors to get the data in the pages, since there's no longer a convenient .json to pull the data from.

Hopefully, this can help the setup:

https://github.com/scrapy-plugins/scrapy-splash

https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/

Cheers