mysociety / mapit

A web service to map postcodes to administrative boundaries and more
Other
268 stars 88 forks source link

Errors in May 2019 BoundaryLine release #341

Closed chris48s closed 5 years ago

chris48s commented 5 years ago

Hello. Just a quick heads-up.. It looks like the May 2019 BoundaryLine release had some exciting errors in it.


Firstly, Ordnance Survey put out a release with a small number of corrections see: https://www.ordnancesurvey.co.uk/docs/release-notes/boundary-line-may19v2.pdf The lastest release available from the Ordnance Survey site differs from the one on http://parlvid.mysociety.org/os/ It may be worth checking that one out, but it looks like you caught some of them at import time: https://github.com/mysociety/mapit/pull/338/files


Annoyingly though, the second release doesn't fix all of the problems. From the ONSPD May 2019 release notes https://www.arcgis.com/sharing/rest/content/items/4e39873ca62c432cbc3a71b55851c6fd/data (page 5)

The 2019 ward changes are included on the ONSPD from May 2019. Unfortunately, the draft release of OS Boundary-Line includes errors for a number of wards:

E05002732 – Stretham – should be E05011571 E05005276 – Garrison – should be E05012199 E05012199 – Sharoe Green – should be E05012208 E05005857 – Mundesley – should be E05011847 E05005863 – Roughton – should be E05011853 E05005865 – St Benet’s – should be E05011854 E05006336 – Seamer – should be E05012387 E05009795 – Warwick Saltisford – should be E05012630

All of those issues appear to still be present in the second May 2019 BoundaryLine release. I've not checked mapit for all of these issues, but just to work the first one though as an example..

East Cambridgeshire: Stretham had a boundary change, but in the May 2019 BoundaryLine release, OS have attached the old GSS code E05002732 to the new boundary. Having imported the ropey data from OS, you've got a single record for Stretham which exists in generations 1-36 https://mapit.mysociety.org/area/2909.html with the generation 36 boundary attached to it, whereas you should have E05002732 in generations 1-35 (with the old boundary attached to it) and E05011571 in generation 36 with this boundary: https://mapit.mysociety.org/area/2909.html

The neighbouring ward (Haddenham) is correct - you've got E05002726 https://mapit.mysociety.org/area/2908.html in generations 1-35 with the old boundary and E05011567 https://mapit.mysociety.org/area/152278.html in generation 36 only.

This means that now if you plot the generation 1-35 boundaries for these 2 neighbouring wards on the same map, they (incorrectly) overlap because you've actually got the generation 1-35 boundary for Haddenham next to the generation 36 boundary for Stretham: https://mapit.mysociety.org/areas/2908,2909.map.html The correct generation 1-35 boundary for Stretham looks like this:

Screenshot at 2019-07-09 13-59-22


Hopefully I've explained that in a way that makes sense :crossed_fingers: As I say, I've not worked through all of them, but there are probably similar fiddly issues for the other areas in that list :(

dracos commented 5 years ago

Thanks for this. Yes, looking at the IDs, I assume it will be the same for all of them except Sharoe Green (which used the ID for Garrison). At least it's only the history that's broken, not active :-/

We'll have to write a management script that takes the October 2018 Boundary-Line data files as input to fix this. I can see two possible ways to go:

  1. In keeping with what we do with council MapIt IDs when their boundaries change, we keep the IDs for those that we've now overwritten which have the current boundary, change them to be gen 36 only with correct ONS ID, and insert a new MapIt ID with the boundary for gen 1-35 and the old ONS ID. This would mean those wards would have very different MapIt IDs from their siblings, and technically anyone storing the old ID for the ward will now be wrong, but otherwise I can't see any ill effects.
  2. We roll back those MapIt IDs to be gen 1-35 with the old boundaries, and create new MapIt IDs in gen 36 with the new ONS ID. These IDs would be 'closer' to their siblings, and conversely technically anyone who has already stored the new ID for the ward will now be wrong.

I can't decide which is better, I guess there is not much in it either way. Sigh.

chris48s commented 5 years ago

Another update on this. As well as the errors identified in the ONSPD release notes I posted above, I spotted a few more of the same class of error:

I contacted OS support about these and they got back to me saying the correct codes for the new boundaries should be:

Corrections for those will appear in the October 2019 BoundaryLine too.

I don't usually delve into a BoundaryLine release in quite this level of detail, so I'm not sure if this is a particularly problematic release, or if these problems are quite common and often aren't noticed :grimacing:

dracos commented 5 years ago

You can look through this repo's control files to see the sorts of things we've had to deal with over the years, e.g. https://github.com/mysociety/mapit/blob/master/mapit_gb/controls/2016-05.py or https://github.com/mysociety/mapit/blob/master/mapit_gb/controls/2017-10.py Our fault for trying to maintain things across releases, I guess, should have just imported them all every time and dealt with it that way, ah well.

Thanks for noting these too; hopefully will get around to it at some point.

dracos commented 5 years ago

Hopefully looking at this now! One more to add to the above: Darenth had E0501239 when it should have been E05012396.

dracos commented 5 years ago

Hi @chris48s Sorry for the delay but hopefully https://github.com/mysociety/mapit/pull/346 might be a script that will fix this issue for those areas. I went with option 1 I waffled about above.

dracos commented 5 years ago

Just for completeness, here are all the above with their old/new MapIt IDs:

Stretham: 153863 / 2909 Garrison: 153864 / 5424 Sharoe Green – 153200 / 5424 Mundesley – 153869 / 5876 Roughton – 153867 / 5868 St Benet’s – 153868 / 5872 Seamer – 153870 / 6333 Warwick Saltisford – 153872 / 145481 Barton - 153866 / 144867 Chartham & Stone Street - 153865 / 144861 Kingston Bagpuize - 153871 / 145354 Darenth - 153642 (only needed ID fix)

dracos commented 5 years ago

Abbotsbury CPC had the wrong GSS code: E0400349 instead of E04003492.

dracos commented 5 years ago

Script to fix that at c58832117c0e9bd4df8f60e4845a088640a15eee