verginer / bnb_scrapy_tutorial

A tutorial on how to write a scrapy spider to get data from Airbnb
http://www.verginer.eu/blog/web-scraping-airbnb/
MIT License
29 stars 30 forks source link

pagination #4

Closed Michaelp86 closed 7 years ago

Michaelp86 commented 7 years ago

Hi Lucca,

few days ago everything worked well but today I can't scrap more than the first page. I think the problem come from the way to find the last page. In your code (bnbspyder.py) there are: li[last()-1]/a/@Href the name of @class looks different on the website and the 'page=' seems to have disappeared. I didn't find how to fix it. If you (or someone else) have an idea, please ;). Cheers,

Michael

toxydose commented 7 years ago

Hello, everyone, I found that the value 'page=' is changed to the value 'section_offset=' But this value is not presented on the first page, the second page has a value section_offset=1, and the 17-th page section_offset=16

Michaelp86 commented 7 years ago

Hi, I saw that also. I changed page= by section_offset=. And I get the good section offset value. So in my code last_page_number is my offset. Then my loop is pageNumber in range(0, last_page_number+1) But I think that I have wrong somewhere because it missed me 18 results (there 18 adds on a Airbnb page). Here is the code if someone see the mistake ;) : def parse(self, response): last_page_number = self.last_pagenumer_in_search(response) if last_page_number < 1: return else: page_urls = [response.url + "/homes?&section_offset=" + str(pageNumber) for pageNumber in range(0, last_page_number+1)] for page_url in page_urls: yield scrapy.Request(page_url, callback=self.parse_listing_results_page)

verginer commented 7 years ago

The issue of page_offset should be fixed now, if you could maybe just verify that it has.

Michaelp86 commented 7 years ago

Hi Lucca,

I have just checked it today. So I've got 227 items on the 300 on the site. For the offsets : 5 , 8 , 12 it scrapped no item. Do you think that it could come from a time delay in the page opening ?

Cheers,

Michael