planningalerts-scrapers / port_pirie_regional_council_sa_development_applications

Port Pirie Regional Council (South Australia) Development Applications
https://morph.io/MichaelBone/port_pirie_regional_council_sa_development_applications
0 stars 0 forks source link

Scrape errors #1

Closed jamezpolley closed 6 years ago

jamezpolley commented 6 years ago
Error Validation failed: Description can't be blank while trying to save application 354/008/18 for Port Pirie Regional Council, SA. So, skipping
Error Mysql2::Error: Data too long for column 'postcode' at row 1: INSERT INTO `applications` (`council_reference`, `address`, `description`, `info_url`, `comment_url`, `date_received`, `date_scraped`, `authority_id`, `lat`, `lng`, `suburb`, `state`, `postcode`) VALUES ('354/021/18', '9 FIFTH (NAP) STREET', 'NEW ATTACHED VERANDAHS AND REMOVAL OF EXISTING VERANDAH', 'http://www.pirie.sa.gov.au/webdata/resources/files/February%202018.pdf', 'mailto:council@pirie.sa.gov.au', '2018-02-01', '2018-08-09 05:41:56', 256, 40.73225830000001, -73.9963717, 'New York', 'NY', '10003') while trying to save application 354/021/18 for Port Pirie Regional Council, SA. So, skipping
Error Validation failed: Description can't be blank while trying to save application 354/E001/15 for Port Pirie Regional Council, SA. So, skipping
Error Validation failed: Address can't be blank while trying to save application 354/185/17 for Port Pirie Regional Council, SA. So, skipping
Error Validation failed: Address can't be blank while trying to save application 354/188/17 for Port Pirie Regional Council, SA. So, skipping
Error Validation failed: Address can't be blank while trying to save application 354/190/17 for Port Pirie Regional Council, SA. So, skipping
Error Validation failed: Address can't be blank while trying to save application 354/411F/08 for Port Pirie Regional Council, SA. So, skipping
109 new applications found for Port Pirie Regional Council, SA with date from 2018-07-26 to 2018-08-09
7 applications errored for Port Pirie Regional Council, SA with date from 2018-07-26 to 2018-08-09
Took 83 s to collect applications from Port Pirie Regional Council, SA 
MichaelBone commented 6 years ago

Again (like the Mount Gambier web site) the raw data is a bit patchy, but I think I've improved it a fair bit to better infer the state and postcode where previously only a suburb name was available. So those changes should now prevent most validation errors.

jamezpolley commented 6 years ago

The "Data too long for 'postcode' error seems to be because that particular application lists no suburb or hundred, just "FIFTH (NAP) STREET".

The GeoCoding API seems to translate that to Fifth St NYC, and so we get a five digit postcode.

I'm not sure what to do here; it seems like providing more hints in the address (maybe a default postcode or suburb, or at least "south australia") might be useful? "FIFTH (NAP) STREET port pirie" in Google Maps gets me a point on Fifth Street, Port Pirie, which is a little better.

But perhaps this suggests we should be changing the way we use the geocoder - perhaps giving it a bounding box or something like that might help us get better results?

MichaelBone commented 6 years ago

I've further updated the scraper so it now uses the street name and hundred name information to derive the suburb name (if one hasn't been provided). It will now also omit any development application for which a valid suburb can't be determined (for example, if it has been omitted from the PDF containing the development application information).

I think this update should resolve all of the problems described in this issue.

LoveMyData commented 6 years ago

Fixed with commit - https://github.com/planningalerts-scrapers/port_pirie_regional_council_sa_development_applications/commit/5debdc55456710d5a2232fa94bd088fc93cb08f5