toby-p / rightmove_webscraper.py

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
MIT License
252 stars 112 forks source link

Don't seem to be getting all the search results #8

Closed JoeDevlin closed 6 years ago

JoeDevlin commented 6 years ago

Thanks for this btw!

I have just tried to use it on this search - http://www.rightmove.co.uk/property-for-sale/find.html?searchType=SALE&locationIdentifier=POSTCODE%5E1266434&insId=1&radius=40.0&minPrice=&maxPrice=&minBedrooms=&maxBedrooms=&displayPropertyType=&maxDaysSinceAdded=&_includeSSTC=on&sortByPriceDescending=&primaryDisplayPropertyType=&secondaryDisplayPropertyType=&oldDisplayPropertyType=&oldPrimaryDisplayPropertyType=&newHome=&auction=false - which has over 120k results, however, when I write the results to CSV I get only 1k results. Is there any reason this might be?

toby-p commented 6 years ago

Hi @JoeDevlin glad you're enjoying it! Unfortunately this is a limitation set by the website - if you look at the bottom of your first results page then you'll see that it actually only returns 42 pages of results; and since there are 24 listings per page then this means you only ever get back a maximum of 1008 results. I assume this is done to protect their proprietary data.

The only workaround I currently know of would be to break down your search into smaller search areas which return fewer than 1008 results, and then run the search multiple times, storing the results to CSV or whatever. Alternatively you might try limiting the search to only listings added in the past 24 hours, and then run it every day to collect the new listings.

JoeDevlin commented 6 years ago

Ahh, that makes sense - thanks @woblers !