openaustralia / planningalerts

Find out and have your say about what's being built and knocked down in your area.
https://www.planningalerts.org.au
Other
95 stars 50 forks source link

Scraping applications cron job failing #685

Closed henare closed 9 years ago

henare commented 9 years ago
 rake aborted!
ActiveRecord::StatementInvalid: Mysql2::Error: Data too long for column 'postcode' at row 1: INSERT INTO `applications` (`address`, `authority_id`, `comment_url`, `council_reference`, `date_received`, `date_scraped`, `description`, `info_url`, `lat`, `lng`, `postcode`, `state`, `suburb`) VALUES ('9 WOODLYN COURT (LOT 14)   SOUTH HOBART 7004 ', 147, 'mailto:hcc@hobartcity.com.au?Subject=Planning+Application+Enquiry%3A+15-00808', '15-00808', '2015-06-29', '2015-07-07 04:14:42', 'New house', 'https://apply.hobartcity.com.au/Pages/XC.Track/SearchApplication.aspx?id=61742', 41.6239716, -87.1937301, '46368', 'IN', 'Ogden Dunes')
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:303:in `query'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:303:in `block in execute'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract_adapter.rb:378:in `block in log'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract_adapter.rb:372:in `log'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:303:in `execute'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/mysql2_adapter.rb:228:in `execute'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/mysql2_adapter.rb:250:in `exec_insert'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract/database_statements.rb:95:in `insert'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract/query_cache.rb:14:in `insert'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/relation.rb:64:in `insert'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/persistence.rb:504:in `_create_record'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/attribute_methods/dirty.rb:87:in `_create_record'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/callbacks.rb:306:in `block in _create_record'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:113:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:113:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:552:in `block (2 levels) in compile'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:502:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:502:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:86:in `run_callbacks'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/callbacks.rb:306:in `_create_record'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/timestamp.rb:57:in `_create_record'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/persistence.rb:484:in `create_or_update'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/callbacks.rb:302:in `block in create_or_update'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:113:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:113:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:552:in `block (2 levels) in compile'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:502:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:502:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:86:in `run_callbacks'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/callbacks.rb:302:in `create_or_update'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/persistence.rb:103:in `save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/validations.rb:51:in `save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/attribute_methods/dirty.rb:21:in `save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:268:in `block (2 levels) in save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:329:in `block in with_transaction_returning_status'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract/database_statements.rb:199:in `transaction'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:208:in `transaction'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:326:in `with_transaction_returning_status'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:268:in `block in save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:283:in `rollback_active_record_state!'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:267:in `save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/associations/has_many_association.rb:40:in `insert_record'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/autosave_association.rb:357:in `block in save_collection_association'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/autosave_association.rb:348:in `each'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/autosave_association.rb:348:in `save_collection_association'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/autosave_association.rb:186:in `block in add_autosave_association_callbacks'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/autosave_association.rb:157:in `instance_eval'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/autosave_association.rb:157:in `block in define_non_cyclic_method'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:429:in `block in make_lambda'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:224:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:224:in `block in halting_and_conditional'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:503:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:503:in `block in call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:503:in `each'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:503:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:86:in `run_callbacks'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/callbacks.rb:310:in `_update_record'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/timestamp.rb:70:in `_update_record'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/persistence.rb:484:in `create_or_update'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/callbacks.rb:302:in `block in create_or_update'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:113:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:113:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:552:in `block (2 levels) in compile'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:502:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:502:in `call'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/callbacks.rb:86:in `run_callbacks'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/callbacks.rb:302:in `create_or_update'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/persistence.rb:103:in `save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/validations.rb:51:in `save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/attribute_methods/dirty.rb:21:in `save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:268:in `block (2 levels) in save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:329:in `block in with_transaction_returning_status'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `block in transaction'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract/database_statements.rb:209:in `within_new_transaction'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `transaction'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:208:in `transaction'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:326:in `with_transaction_returning_status'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:268:in `block in save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:283:in `rollback_active_record_state!'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/transactions.rb:267:in `save'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/persistence.rb:222:in `update_attribute'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/authority.rb:15:in `add'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/authority.rb:139:in `rescue in block in collect_applications_date_range'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/authority.rb:134:in `block in collect_applications_date_range'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/authority.rb:129:in `each'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/authority.rb:129:in `collect_applications_date_range'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/authority.rb:105:in `block in collect_applications'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/core_ext/benchmark.rb:12:in `block in ms'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activesupport-4.1.11/lib/active_support/core_ext/benchmark.rb:12:in `ms'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/authority.rb:104:in `collect_applications'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/application.rb:58:in `block in collect_applications'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/relation/delegation.rb:46:in `each'
/srv/www/www.planningalerts.org.au/app/shared/bundle/ruby/2.0.0/gems/activerecord-4.1.11/lib/active_record/relation/delegation.rb:46:in `each'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/app/models/application.rb:58:in `collect_applications'
/srv/www/www.planningalerts.org.au/app/releases/20150630230827/lib/tasks/planningalerts.rake:9:in `block (3 levels) in <top (required)>'
Tasks: TOP => planningalerts:applications:scrape_and_email => planningalerts:applications:scrape
(See full trace by running task with --trace)
E, [2015-07-07T14:14:42.851136 #11040] ERROR -- : Error Mysql2::Error: Data too long for column 'postcode' at row 1: INSERT INTO `applications` (`address`, `authority_id`, `comment_url`, `council_reference`, `date_received`, `date_scraped`, `description`, `info_url`, `lat`, `lng`, `postcode`, `state`, `suburb`) VALUES ('9 WOODLYN COURT (LOT 14)   SOUTH HOBART 7004 ', 147, 'mailto:hcc@hobartcity.com.au?Subject=Planning+Application+Enquiry%3A+15-00808', '15-00808', '2015-06-29', '2015-07-07 04:14:42', 'New house', 'https://apply.hobartcity.com.au/Pages/XC.Track/SearchApplication.aspx?id=61742', 41.6239716, -87.1937301, '46368', 'IN', 'Ogden Dunes') while trying to save application 15-00808 for Hobart City Council, TAS. So, skipping
henare commented 9 years ago

OK, there's a few layers of fail here.

The address is geocoding to somewhere in the US, even with our region biasing: http://maps.googleapis.com/maps/api/geocode/json?region=AU&address=9%20WOODLYN%20COURT%20%28LOT%2014%29%20%20%20SOUTH%20HOBART%207004

Should we even be saving applications that fail geocoding like this? It doesn't help the citizen so I don't think so.

Furthermore, as far as I know, the scraping process shouldn't crash like this because any errors during saving an application should be caught and simply reported on.

mlandauer commented 9 years ago

I don't think we should silently throw away applications that fail to geocode properly. To me the more sane thing to do is to save them in the database without a valid lat long. That way we have the potential to backtrack and fix geocoding errors. Otherwise, it's just silently failing and disappearing which seems very bad to me.

henare commented 9 years ago

Otherwise, it's just silently failing and disappearing which seems very bad to me.

We currently disappear applications for all sorts of reasons, like not having a description. What are your thoughts on that?

In any case, all I'm going to try and fix as part of this issue is:

Furthermore, as far as I know, the scraping process shouldn't crash like this because any errors during saving an application should be caught and simply reported on.

mlandauer commented 9 years ago

Not having a description is a more legitimate reason to disappear an application because it's pretty useless without a description and the problem is definitely going to originate at the scraper level. With a bad geocode it could be just that the google doesn't know about a new address yet, everything else about the application is correct, valid and useful. The geocoded information helps the system find the application that's interesting to you but it's not necessary for an application to be valid. For example, if I show you an application for "26 Freelander Avenue, Katoomba" then that's useful to you whether it's geocoded or not. Does that make anything clearer?

henare commented 9 years ago

Yeah I see more where you're coming from. I was thinking because it's not geocoded you never see it because it doesn't get emailed out but I guess it would appear in search.