openbenches / openbenches.org

OpenBenches.org - an open data repository for memorial benches
https://openbenches.org/
MIT License
170 stars 26 forks source link

Addresses don't always make sense. #95

Closed arizonagroovejet closed 1 year ago

arizonagroovejet commented 7 years ago

Benches are shown with an address, but the addresses are inconsistent in construction, can be inconsistent for benches that could be reasonable considered to be at the same address and they sometimes don't make any sense given where the bench is. The idea of displaying an address for a bench seems inherently problematic given the sorts of places benches are commonly found.

https://openbenches.org/bench/1765/ is the opposite side of a river to Southern Lane.

https://openbenches.org/bench/1168/ is about half a kilometre away from Fairway Drive.

https://openbenches.org/bench/1611/ isn't in anywhere near Island Drive.

The address for https://openbenches.org/bench/1831/ makes sense because it starts War Memorial Park and the bench is indeed in that park. But a stone's throw away is https://openbenches.org/bench/1674/ and the address for that doesn't mention War Memorial Park it just says Kenilworth Road, a very long road which the bench very much isn't on.

https://openbenches.org/bench/254/ has the address Hay Wood Lane, Warwick but is not on on Hay Wood Lane and nowhere near Warwick. (The estate the bench is on doesn't claim to be in Warwick https://www.nationaltrust.org.uk/baddesley-clinton#How%20to%20get%20here)

I see how it's nice to give some sort of immediately comprehensible information about the location of the bench. Maybe it would help if the addresses were less specific. Omit the first part E..g

https://openbenches.org/bench/1611/ Ballynahaglish, Tralee, County Kerry, Ireland

https://openbenches.org/bench/1765/ Stratford-on-Avon CV37 6BA, United Kingdom. Though according to Google Maps that postcode is still the wrong side of the river and even further from the bench than Southern Lane.

Maybe a less specific address plus having the map zoomed out more to show location of the bench within the context of the country it is in. E.g.

benchlocation

People can easily zoom in if they want to seem exactly where in Stratford-Upon-Avon the bench is.

(Countries vary greatly in size and shape and zoom level that works well for UK probably wouldn't convey anything useful for a bench in China)

Maybe I should not be so concerned with the accuracy of the address. :D

edent commented 7 years ago

It's a good point. We use OpenCage reverse geocoding for this. Full documentation at https://geocoder.opencagedata.com/api#formatted

The main problem is knowing how precise your be. 23 Acacia Avenue is as precise as Melchester War Memorial - but have different semantic meanings.

The addreses are stored in the DB but can easily be regenerated with less precision.

arizonagroovejet commented 7 years ago

I’ve been thinking about this a bit. I have an OpenCage API key and I see how the addresses currently being used are the obvious thing to use from the OpenCage data. I think I have an idea on how to get a better, for some value of better, address out of the OpenCage data. I keep failing to find time to explore it more.

I think the address displayed should

I previously suggested that maybe addresses should be less specific but applying that to https://openbenches.org/bench/210/ results in saying it's in Lichfield which I've just said is wrong.

arizonagroovejet commented 6 years ago

Seeing you tweet this bench https://openbenches.org/bench/5879 asking "Anyone in Warwick…" reminded me of this issue. That bench is not in Warwick. It's in Royal Leamington Spa. I've not looked at the OpenCage location data for that specific bench but I have fairly recently looked at the data for some other benches in Leamington Spa and for all of them it has the location as being in the town of Warwick and the suburb of Royal Leamington Spa. Which I'd count as failing the "Not be considered egregiously wrong by someone with local knowledge" criteria I suggest above.

It also reminded me I've been looking in this issue some more and think I've come up with a better, or at least significantly less bad, way of constructing an address using the OpenCage data that's better than what OpenCage put in the formatted field. Before I go to the effort of attempting to write it up, it seems worth asking, would you be interested in hearing about it?

arizonagroovejet commented 6 years ago

After posting previous comment I noticed that Twitter says "Leamington Spa, England" https://twitter.com/openbenches/status/993939319575535617 which is a much better description of where the bench is than "Warwick CV32 4LG, United Kingdom"

Maybe it's worth looking at getting a location from Twitter. https://developer.twitter.com/en/docs/geo/places-near-location/overview

arizonagroovejet commented 6 years ago

Maybe it's worth looking at getting a location from Twitter.

Or not! You'd be unable to get location data when they've suspended your account because "oops, our automated systems :woman_shrugging:"

edent commented 6 years ago

Yes, certainly happy to hear about a better way to get addresses.

edent commented 6 years ago

Taking a look at Twitter.

Aside from the weird "England / UK" thing, it does mean you lose the occasional bit of useful info like "War Memorial Park".

arizonagroovejet commented 6 years ago

Info like "War Memorial Park" is not always present in the OpenCage data though. E.g. the data for 1830 and 1674 doesn't mention "War Memorial Park" and they're both very much within that park.

I've not looked at the OpenCage data for every single bench, but based on what I have looked at I've concluded that it's hopelessly inconsistent in what components values are present and how accurate those values are. In some areas that data is inconsistent in what components values are present for coordinates even closer together than 1831 1830 and 1674. I have in my notes that data for point A has a suburb field, data for point B ~20 meters away doesn't have a suburb field. (I don't have in my notes where point A or B are!) To be honest the more I look at it the more I think of the phrase Garbage In Garbage Out. That's maybe a bit harsh, but I really wonder about where this data comes from that it's so inconsistent. A cursory investigation has made me suspect there is no better free/open source of location data as the data OpenCage have is aggregation of various sources. One other service I fed some coordinates to gave me the same location information as OpenCage.

Another conclusion I've come to is that more specific the OpenCage data is about where a location is the more likely it is to be wrong. So the method I've worked out for constructing addresses, (I'm only really using the term address because the string is within a div element with an id value of "address), deliberately ignores very specific stuff like "War Memorial Park" because of above described inconsistency and produces something more like what you'd get from Twitter (assuming they haven't mysteriously suspended your account). Also, you can see from the map on the bench page that the bench is in War Memorial Park. One day, maybe even very soon, I will actually post that method.

arizonagroovejet commented 6 years ago

OK, here it is, for what it's worth. Which may be nothing.

I've concluded that it's impossible to achieve all three of the requirements I previously listed without manually crafting each location string. The value of the formatted field, currently displayed on the website, is presumably supposed to be an "if you want a nice location string use this" value, but in many cases it's just bad, in at least some cases because the rest of the data is also bad. See comments above about the inconsistent nature of the OpenCage data. Part of the problem may be how the OpenCage reverse geocoding works. https://geocoder.opencagedata.com/api#formatted says

In the following example, a response in JSON format is requested to get the nearest address for coordinates -22.6792, 14.5272. "

If it's attempting to give you the "nearest address" that might explain things like the location data containing a road name despite the bench being nowhere near a road.

More examples of how the location data is bad than you may be willing to read but not even close to exhaustive:

https://openbenches.org/bench/2 Location data says it's on Baker's Lane, but it's very clearly on Church Way.

https://openbenches.org/bench/371 Location data has suburb as Friendship Heights, but a quick look on Google Maps shows it's 4-5KM away and the zip code is similarly wrong.

https://openbenches.org/bench/947 Location data has a suburb field, but no town or city fields. The formatted field contains a road name even though the bench is not in any sense on a road and that road name isn't even the closest road to where the bench is. The location data does have a village field.

https://openbenches.org/bench/2317 Location data does not have a village field despite that location being in a village. The name of the village is in the suburb field.

https://openbenches.org/bench/2344 Location data has both a town and city field. The city is Gelding and the town is Nottinghamshire. There's a village and district called Gelding, but not a city. Nottinghamshire obviously isn't a town. There's no county field, I guess because what should be in that is in the town field. The suburb value is "Arnold and Carlton", which doesn't seem to exist.

https://openbenches.org/bench/2592 Location data has the city as Lichfield, I'm guessing because that location is just within the administrative area District of Lichfield, but saying that's in Lichfield makes no sense geographically. Tamworth, which is where the Hospital itself says it is, isn't even in the location data.

Location data has benches 1834 - 1838 and others in the same park as being in four different suburbs, two of which seem highly implausible from looking at a map. The location data for benches in that park is a mess of inconsistency at anything more specific than the city level.

https://openbenches.org/bench/5716 According to the location data it and others in the same village are in the fictional city of "Amber Valley". For some of them, not all, the location data has a suburb field which contains "Crich CP" which, if you remove the mysterious "CP" gives you the name of the village. There's no village field.

I think the general way to get a (mostly reasonably) good description of the bench's location using the OpenCage data is to be less specific, because the more specific the location data tries to be the more liable is is to be the wrong. Also the displayed location doesn't need to be too specific, because the exact location is shown on the map. If you look at the map on https://openbenches.org/bench/2 you can see which road the bench is on without interacting with the map, so there's not really any value to displaying that information as text above the map, especially given that it's wrong. You can't see, without interacting with the map, that the bench is in Oxford, so there's value in displaying that information above the map. For https://openbenches.org/bench/1611/ and many others there's no value to displaying a road name because the bench isn't on a road. A lot of the displayed locations contain a postcode. There doesn't seem to be a lot of value in that as it's not like people will look at a postcode and think "I know where that is".

Whilst I think there simply isn't a perfect one size fits all solution, I think I've devised something that on the whole produces that's generally less bad than what's in the formatted field. And that method is this

$location="";

if (isset( $locationData->results[0]->components->village)) {
    $location.=$locationData->results[0]->components->village;
} elseif (isset( $locationData->results[0]->components->suburb)) {
    $location.=$locationData->results[0]->components->suburb;
}

if (isset( $locationData->results[0]->components->city)) {
    $location.=", ".$locationData->results[0]->components->city;
} elseif (isset( $locationData->results[0]->components->town)) {
    $location.=", ".$locationData->results[0]->components->town;
}

if (isset( $locationData->results[0]->components->state)) {
    $location.=", ".$locationData->results[0]->components->state;
}

if (isset( $locationData->results[0]->components->country)) {
    $location.=", ".$locationData->results[0]->components->country;
}

This is a file that contains the value of the formatted field from the OpenCage data as currently shown on the website* and what's generated by the above method. I stress I think it's generally less bad. I don't like some of the results it generates. Like for 5566 it gives "Royal Leamington Spa, Warwick, England, United Kingdom" which makes it seem like Royal Leamington Spa is in Warwick, which it simply isn't, but unlike the contents of the formatted field it does at least include the name of the town the bench is in and give it precedence.

I eagerly await people pointing out where my method produces results they consider to be terrible. :D

(*) Except where the value of the formatted field for a location has apparently changed since a bench was uploaded. E.g. https://openbenches.org/bench/5566 says the photo was taken on 15th April and gives the location as "Warwick CV32 4EA, United Kingdom". I pulled the OpenCage data for that bench today and the formatted field says "CV32 14, Newbold Terrace, Warwick CV32 4EA, United Kingdom" There's a timestamp->created_http value in the data which is ""Wed, 25 Apr 2018 19:05:15 GMT"

edent commented 3 years ago

I'm still looking at this :-) We had a request recently for "how many benches do you have in Scotland". We don't record that data.

What we get back from OpenCage is something like:

     "components": {
        "ISO_3166-1_alpha-2": "GB",
        "ISO_3166-1_alpha-3": "GBR",
        "_category": "road",
        "_type": "road",
        "continent": "Europe",
        "country": "United Kingdom",
        "country_code": "gb",
        "county": "Kent",
        "county_code": "KEN",
        "postcode": "CT15 4LL",
        "road": "unnamed road",
        "state": "England",
        "state_code": "ENG",
        "state_district": "South East England",
        "suburb": "Goodnestone",
        "town": "Dover",
        "village": "Goodnestone"
      },
      "confidence": 9,
      "formatted": "unnamed road, Goodnestone CT15 4LL, United Kingdom",

So I'm thinking of storing all those components in the DB, allowing us to do a better lookup. And then we can build our own precision display.

edent commented 1 year ago

I've regenerated all the addresses in the database. They're (hopefully) marginally more correct now. It's also now possible to find all benches within a bounding box - eg https://openbenches.org/location/UK,%20Weston-Super%20Mare

Fundamentally, there's a disconnect between a meaningful-to-human address and a location. That can't be solved without manual intervention.