tomslee / airbnb-data-collection

Data collection for Airbnb listings.
MIT License
478 stars 183 forks source link

How to scrape a city? #21

Closed angelotc closed 7 years ago

angelotc commented 7 years ago

Hi Mr. Slee, very interested in your and Mr Cox's works!

Tried doing this command:

python airbnb.py -asa "Tokyo" and received the following

ERROR:root:Top level exception handler: quitting.
Traceback (most recent call last):
  File "airbnb.py", line 440, in main
    ws_get_city_info(ab_config, args.addsearcharea, ab_config.FLAGS_ADD)
  File "airbnb.py", line 237, in ws_get_city_info
    cur.execute(sql_check, (citylist[0],))
psycopg2.ProgrammingError: relation "search_area" does not exist
LINE 3:                         from search_area
tomslee commented 7 years ago

Hi "Angelotc",

It looks like you have no table called "search_area" which is one of the tables you need. Can you run

`python schema_update.py

Which should create the tables you need. Although it is not a well-tested script I am afraid.

angelotc commented 7 years ago

Hi Tom,

I have tried that code, but got this error:

$ python schema_update.py
INFO    Check: schema_version table already has version column
Traceback (most recent call last):
  File "schema_update.py", line 327, in <module>
    main()
  File "schema_update.py", line 322, in main
    fix_room_table()
  File "schema_update.py", line 226, in fix_room_table
    test_room_id = cur.fetchone()[0]
TypeError: 'NoneType' object is not subscriptable
cortesimone commented 7 years ago

Hi, I'm running in the same error, on Postgresql-9.6 and PostGIS-2.3 on Ubuntu 17.04

ERROR:root:Error collecting city and neighborhood information
ERROR:root:Error getting city info from website
ERROR:root:Top level exception handler: quitting.
Traceback (most recent call last):
  File "airbnb.py", line 440, in main
    ws_get_city_info(ab_config, args.addsearcharea, ab_config.FLAGS_ADD)
  File "airbnb.py", line 237, in ws_get_city_info
    cur.execute(sql_check, (citylist[0],))
ProgrammingError: relation "search_area" does not exist
LINE 3:                         from search_area

python schema_update.py gives me this result:

root@airbnb:~/airbnb-data-collection# python schema_update.py
INFO    Check: schema_version table already has version column
INFO    Check: room table already has room_id column
INFO    Check: room table already has coworker_hosted column
INFO    Check: survey_progress_log_bb table already has survey_id column

Can you please help? Thanks!

cortesimone commented 7 years ago

@angelotc I had the same issue as you are reporting, and I've fixed it by executing first:

psql USER -h 127.0.0.1 -d DB < postgresql/schema.sql

and then schema_current.sql

angelotc commented 7 years ago

@cortesimone awesome that u got it to work. did it scrape well?

cortesimone commented 7 years ago

it worked. I still have to find out why it scraped not just in the chosen area (a city in Italy), but as well in Warsaw (Poland).

jmk201 commented 7 years ago

Hello, Could somebody please help me with this? I get the same error

$ python schema_update.py INFO Check: schema_version table already has version column Traceback (most recent call last): File "schema_update.py", line 327, in main() File "schema_update.py", line 322, in main fix_room_table() File "schema_update.py", line 226, in fix_room_table test_room_id = cur.fetchone()[0] TypeError: 'NoneType' object is not subscriptable

@angelotc , where you able to get it to work?

tomslee commented 7 years ago

Hi @jmk201. schema_update.py is basically broken :(.

If you are starting from nothing, the schema is in the file postgresql/schema_current.sql. You need to run that file to create the database tables to start with (assuming both your user and database are named airbnb). For example, if you use psql:

psql --user airbnb airbnb < postgresql/schema_current.sql

Good luck.