Open johnrager opened 8 years ago
Would like to add a thought on this that might get us and other customers just the flexibility we need: Add another switch "Ignore geocoding failures" to the GUI and SDK governing whether the inability to geocode and address should be considered an "error" or not. If set "on", then just set the Location column for that row to null and continue. If set "off", then treat it as an error and let "Set aside errors" govern what to do next.
We do not use Socrata geocoding much so I do not necessarily have too much of a stake in this but I like that suggestion.
Where I do have a stake is to ask that Socrata be careful about any new features breaking existing processes or workflows. Sometimes, when flags have changed before, it has been in ways that were not fully backwards compatible.
We're looking into enhancing our automated refresh process to start taking advantage of DataSync’s SDK rather than the SODA 2 library we currently rely on. We’ve run into an issue that has pretty-much stopped us in our tracks, related to how geocoding of address fields is handled in the DataSync SDK. The support issue thread is: https://support.socrata.com/hc/en-us/requests/14390.
From what we understand we have two options:
Because of the limitation we ran into with option 1, we’ve been pursuing option 2 but have run into a problem. It appears DataSync is much stricter with its geocoding and we’ve run into addresses that have actually caused the entire refresh process to fail. If we run the same data through either the web interface or through our existing SODA 2 refresh process, the entire refresh runs but some rows just don’t get geocoded. This is expected. If we run the file through DataSync, it fails completely as soon as it hits the first bad address.
We tried testing via DataSync with “Set aside errors” turned on and the process completed but the problem rows were excluded from the dataset. This isn’t workable from our perspective. We can’t have rows missing just because an address didn’t geocode, and with the number of datasets we have we can’t distribute problem reports to data owners asking them to correct addresses and resubmit. We need DataSync to handle geocoding just like the web interface and SODA 2 does.
We’d really like to make DataSync more of a part of our operation, but we don’t think we can unless we have a more workable way to handle geocoding. We’re pretty-much dead in the water on this right now.