petewarden / dstk

A collection of the best open data sets and open-source tools for data science
http://www.datasciencetoolkit.org/
1.12k stars 186 forks source link

Frequent internal errors when using the DSTK AMI #29

Closed yholkamp closed 10 years ago

yholkamp commented 11 years ago

Lately we've been using a lot of internal errors being returned by the DSTK API for a large range of queries, which makes me wonder if we've perhaps misconfigured it somehow. Each of these exceptions boil down to the following:

23.0.0.109 - - [11/Aug/2013 16:41:53] "GET /maps/api/geocode/json?address=Mansfield,%20TX,%20US " 200 1104 0.0025
54.0.0.128 - - [11/Aug/2013 16:41:54] "GET /info " 200 19 0.0006
54.0.0.128 - - [11/Aug/2013 16:42:06] "GET /info " 200 19 0.0006
ERROR:  relation "postal_codes" does not exist
LINE 1: DECLARE myportal CURSOR FOR SELECT * FROM postal_codes WHERE...
                                                  ^
SystemExit - exit:
 /home/ubuntu/sources/dstk/geodict_lib.rb:776:in `exit'
 /home/ubuntu/sources/dstk/geodict_lib.rb:776:in `select_as_hashes'
 /home/ubuntu/sources/dstk/geodict_lib.rb:563:in `is_postal_code'
 /home/ubuntu/sources/dstk/geodict_lib.rb:83:in `send'
 /home/ubuntu/sources/dstk/geodict_lib.rb:83:in `find_locations_in_text'
 /home/ubuntu/sources/dstk/geodict_lib.rb:776:in `each_with_index'
 /home/ubuntu/sources/dstk/geodict_lib.rb:70:in `each'
 /home/ubuntu/sources/dstk/geodict_lib.rb:70:in `each_with_index'
 /home/ubuntu/sources/dstk/geodict_lib.rb:70:in `find_locations_in_text'
 /home/ubuntu/sources/dstk/geodict_lib.rb:60:in `each'
 /home/ubuntu/sources/dstk/geodict_lib.rb:60:in `find_locations_in_text'
 /home/ubuntu/sources/dstk/emulategoogle.rb:41:in `google_geocoder_api_call'
 ./dstk_server.rb:1322:in `GET /maps/api/geocode/:format'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:1125:in `call'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:1125:in `compile!'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:709:in `instance_eval'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:709:in `route_eval'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:693:in `route!'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:741:in `process_route'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:738:in `catch'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:738:in `process_route'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:692:in `route!'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:691:in `each'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:691:in `route!'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:826:in `dispatch!'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:619:in `call!'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:791:in `instance_eval'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:791:in `invoke'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:791:in `catch'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:791:in `invoke'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:619:in `call!'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:604:in `call'
 /var/lib/gems/1.8/gems/rack-1.2.1/lib/rack/methodoverride.rb:24:in `call'
 /var/lib/gems/1.8/gems/rack-1.2.1/lib/rack/commonlogger.rb:18:in `call'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:1237:in `call'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:1263:in `synchronize'
 /var/lib/gems/1.8/gems/sinatra-1.2.0/lib/sinatra/base.rb:1237:in `call'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/rack/request_handler.rb:96:in `process_request'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_request_handler.rb:516:in `accept_and_process_next_request'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_request_handler.rb:274:in `main_loop'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/rack/application_spawner.rb:206:in `start_request_handler'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/rack/application_spawner.rb:171:in `send'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/rack/application_spawner.rb:171:in `handle_spawn_application'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/utils.rb:470:in `safe_fork'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/rack/application_spawner.rb:166:in `handle_spawn_application'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server.rb:357:in `__send__'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server.rb:357:in `server_main_loop'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server.rb:206:in `start_synchronously'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server.rb:180:in `start'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/rack/application_spawner.rb:129:in `start'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/spawn_manager.rb:253:in `spawn_rack_application'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server_collection.rb:132:in `lookup_or_add'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/spawn_manager.rb:246:in `spawn_rack_application'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server_collection.rb:82:in `synchronize'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server_collection.rb:79:in `synchronize'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/spawn_manager.rb:244:in `spawn_rack_application'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/spawn_manager.rb:137:in `spawn_application'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/spawn_manager.rb:275:in `handle_spawn_application'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server.rb:357:in `__send__'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server.rb:357:in `server_main_loop'
 /var/lib/gems/1.8/gems/passenger-3.0.19/lib/phusion_passenger/abstract_server.rb:206:in `start_synchronously'
 /var/lib/gems/1.8/gems/passenger-3.0.19/helper-scripts/passenger-spawn-server:99

Our installation has the latest revisions from Git on it. Is there perhaps some way to resolve these postal_codes errors?

petewarden commented 11 years ago

The postal code parsing is a relatively new addition to the library, so it sounds like you have the latest code but not the data it's looking for. The postal_codes table is normally created as part of the populate_database.rb script in load_postal_codes(), and the canonical way to create it would be to re-run that script. This is one of the longest parts of the whole setup process though, so there are a few shortcuts you can use instead.

If you don't care about handling non-US/UK postal codes in your application, you can just create an empty table to remove the errors, by running something like this from a pgsql interactive prompt on the geodict database (copied from the start of load_postal_codes()):

CREATE TABLE postal_codes (
    postal_code VARCHAR(64),
    region_code VARCHAR(64),
    country_code CHAR(2),
    lat FLOAT,
    lon FLOAT,
    last_word VARCHAR(32));

If you want to just load the postal codes, without rebuilding the other data types, grab the latest data from the dstkdata repository (which should end up in ~/sources/dstkdata), comment out all the load_*() calls except load_postal_codes() at the end of populate_database.rb, and then run the script.

Let me know if those help - if not I'm happy to dig in deeper into what's going wrong!

yholkamp commented 11 years ago

Thanks once again @petewarden. Running the populate_database.rb again indeed solved it. I did have to run a git pull in the dstkdata folder though as the postal codes file wasn't there yet, I think that's where it went wrong earlier.