tomslee / airbnb-data-collection

Data collection for Airbnb listings.
MIT License
478 stars 183 forks source link

IP address blocked and survey quitting instantly #35

Closed RiceOwlXinTan closed 6 years ago

RiceOwlXinTan commented 6 years ago

Hi Tom,

Thank you so much for your instruction and script. This would help my research a lot. I have implemented all the steps in the README, including constructing a database through pgAdmin, but I always meet problems when implementing the survey. Whichever city I choose, the survey ends instantly when I start it, and no data is stored in the database. This happens when I search by both neighborhood and zip code.

/Users/xins/anaconda3/lib/python3.5/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi. """) INFO ============================================================== INFO Survey 8, for atlanda INFO Searching by zipcode INFO Finishing survey 8, for atlanda

When I search through bounding box, I constantly receive the message that I am blocked by the website.

INFO ============================================================== INFO Survey 8, for atlanda INFO Searching by bounding box, max_zoom=12 INFO ---------------------------------------------------------------------- INFO Rectangle calculated: [33.887618, -84.289389, 33.647808, -84.551819] INFO Searching rectangle: zoom factor = 0, node = [] WARNING HTTP status 400 from web site: IP address blocked.Waiting 1.0 minutes. WARNING HTTP status 400 from web site: IP address blocked.Waiting 1.0 minutes. WARNING HTTP status 400 from web site: IP address blocked.Waiting 1.0 minutes.

This warning message repeats as my survey go on, and no data is stored. Is it possible that you can let me know where I may possibly make mistakes or mess up some steps?

tomslee commented 6 years ago

Hi RiceOwlXinTan.

The zipcode method is not really working now. Could you please run "python airbnb.py -sb 8 -v" and post the output? The -v adds some logging information.

It looks like you are running with no proxies. I did have problems when running without proxies recently, but I believe I fixed them. If you could check your results with the most recent version that would be helpful.

RiceOwlXinTan commented 6 years ago

Hi Tom,

Thanks a lot! It turns out that I missed the API-key. I also buy some proxy IPs. Now the script works pretty well. It helps my research a lot. I compare my recent result to the ones you have posted a year ago. The number of Airbnb houses in Houston decreases by 30%. I am wondering whether it's because Airbnb changed their website format that some houses are missing.

tomslee commented 6 years ago

Hi. I'm glad it runs, but I agree it is missing some listings. See the note near the top of the readme for a statement - I will add one more prominently.

Unfortunately I have no idea why the survey now misses some items, but anecdotes suggest that some others have been having the same problems with other code. I suspect Airbnb makes some listings harder to search for, but I don't know the criteria.

I wish I had better news.

tomslee commented 6 years ago

Hi RiceOwlXinTan - I believe I've found the cause of the lower listing count. Searches on the Airbnb site no longer return listings that have all days in their calendar marked "unavailable", which turns out to be quite a lot of listings. I have not found a workaround to capture those listings that are still on the site, but which no longer show up in searches.