Open maxgrossman opened 7 years ago
regarding the 2nd issue above, I took the metadata file and grouped each record by station id
, then by coordinate
to find the unique locations within each station id
group.
See below just a small sample of the output (the object key here is longitude).
{
"47.067689": [
...
],
"47.067691": [
...
]
},
{
"35.151952": [
...
],
"0.000316": [
...
]
}
Most location groups match the first object where stations are but a few hundred thousandths, ten thousandths of a degree off (and as such only a few 10s/100s meters off from one another...)
I'd think in either case just selecting the 1st of unique set of coordinates among the records would be viable solution. If we want to root out the certain outliers (like the last record above) maybe we spatial select by all of europe first, then reverse geocode.
cc @olafveerman
Did some further digging.
In total, there are 775 station IDs that have multiple coordinates. See full list of station id's with multiple coordinates. The majority of the coordinates differ little (<0.001), but there are some significant differences that may lead to different outcomes of the reverse geocoding:
See this CSV with the results.
This issue is best resolved at the source. @jflasher @RocketD0g Would it be worth sending EEA the list with these issues?
I think it'd be great to report this back up to EEA. I can send that along with associated data if it's all ready to go?
On August 28, 2017 at 17:12:40, Olaf Veerman (notifications@github.com(mailto:notifications@github.com)) wrote:
Did some further digging.
In total, there are 775 station IDs that have multiple coordinates. See full list of station id's(https://github.com/openaq/battuta/files/1258361/multiple-coords.txt) with multiple coordinates. The majority of the coordinates differ little (<0.001), but there are some significant differences that may lead to different outcomes of the reverse geocoding:
21 stations have a cumulative difference > 0.01 65 stations have a cumulative difference > 0.001
See this CSV with the results(https://gist.github.com/olafveerman/15a526fffc2059a6f18a089a6c31b9f1).
This issue is best resolved at the source. @jflasher(https://github.com/jflasher) @RocketD0g(https://github.com/rocketd0g) Would it be worth sending EEA the list with these issues?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub(https://github.com/openaq/battuta/issues/5#issuecomment-325483316), or mute the thread(https://github.com/notifications/unsubscribe-auth/AAz0JgFR2fuI3uxIO_Z3nKDcYgFZOzxYks5scy1IgaJpZM4PErCx).
@jflasher Great. There are a couple of issues with network_timezone
that @maxgrossman will report back on in: https://github.com/openaq/openaq-fetch/issues/298
Maybe you can bundle that up? Feel free to cc and defer to us if they have more questions.
@olafveerman and I have spent some time with the metadata file to see how it may be generative of #4.
There are two culprits.
I'm going to work on the code to handle these issues, and while doing so flag those station ids that have multiple locations and provide that here.