mysociety / mapit

A web service to map postcodes to administrative boundaries and more
Other
269 stars 88 forks source link

New UK import fails when looking for LGW parents #356

Closed torotil closed 3 years ago

torotil commented 4 years ago

I’m trying to create a new database with the current UK data. It fails when looking for parent areas for NI wards:

$ .venv/bin/python mapit/manage.py mapit_UK_find_parents -v2
/home/roman/mapit-build/mapit/project/settings.py:17: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(fp)
Processing Abbas and Templecombe [15205] (CPC)
Processing Abberley [14126] (CPC)
Processing Abberton [10340] (CPC)
Processing Abberton [15506] (CPC)
Processing Abbess Beauchamp and Berners Roding [10206] (CPC)
Processing Abbey [21619] (UTE)
Processing Abbey [8946] (UTW)
Processing Abbey [5545] (UTW)
Processing Abbey [5063] (UTW)
Processing Abbey [5588] (UTW)
Processing Abbey [8526] (UTW)
Processing Abbey [22504] (LGW)
Traceback (most recent call last):
  File "mapit/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/roman/mapit-build/.venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/home/roman/mapit-build/.venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/roman/mapit-build/.venv/lib/python3.8/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/roman/mapit-build/.venv/lib/python3.8/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/home/roman/mapit-build/mapit/mapit/management/find_parents.py", line 51, in handle
    raise Exception("Area %s does not have a parent?" % (self.pp_area(area)))
Exception: Area Abbey [22504] (LGW) does not have a parent?

I’ve looked in the database and the parent area (Macedon) actually exists. Is there anything else I can do to debug this?

dracos commented 4 years ago

The find parents command uses polygon.polygon.point_on_surface to pick a point to then look up the parent area; if the source data isn't aligned, I guess it is possible that it fails to find a parent in some way. You'd have to work out what point it was using and see if that is right to check that, though, but I can't see why else it wouldn't work. Looking at https://mapit.mysociety.org/area/145970.html it does e.g. have a cut by Glenville Road that means the LGW isn't entirely within it, so it could theoretically happen, and perhaps didn't when I did it in our system? Not sure how to ask it to pick another point if it does happen, though. The script could at least carry on and give you a list to fix manually, or something, maybe.

torotil commented 4 years ago

This seems to happen for all and only LGW areas in my import. Looks like something’s wrong with the imported areas or shapefiles then.

dracos commented 4 years ago

Have you checked that the LGW and LGE display okay by themselves - ie the data seems okay for both sets? Does Abbey (22504) display as you would expect on the map?

torotil commented 4 years ago

The LGE areas (eg. Macedon) appear somewhere east of Africa. That’s perhaps not correct :-D I’m rechecking the --srid arguments …

torotil commented 4 years ago

Ok, something seems to be wrong with SRID 29902. Northern ireland is also there.

I think it is properly defined though:

# select auth_srid, proj4text from spatial_ref_sys where auth_srid IN ('900913', '27700', '29902');
 auth_srid |                                                    proj4text                                                     
-----------+------------------------------------------------------------------------------------------------------------------
     27700 | +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs 
     29902 | +proj=tmerc +lat_0=53.5 +lon_0=-8 +k=1.000035 +x_0=200000 +y_0=250000 +datum=ire65 +units=m +no_defs 
    900913 | +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs
torotil commented 4 years ago

It looks like longtitude and latitude are switched for these areas. Any idea how that could happen?

torotil commented 4 years ago

I also tried copying the entry for the spatial_ref_table from here https://spatialreference.org/ref/epsg/29902/postgis/ and from another instance that I created earlier. Both didn’t change anything. I’ve reimported the ONSI areas after every change, but the output was not affected. Funny fact: Even completely erasing the entry in that table doesn’t change anything (import still works, it’s displayed at the same position).

torotil commented 4 years ago

… and because images say more than words:

Northern ireland displayed east of Africa

Macedon displayed east of Africa

Abbey displayed properly

dracos commented 4 years ago

Umm, I'm not sure this is something I can debug remotely very easily. Start at the beginning - have you checked if your source data that you are trying to import is correct, has lat/lon swapped, what SRS it is in, and so on? Because perhaps it is correctly showing whatever is being imported...

dracos commented 4 years ago

(I should note it is possible, depending on precisely what is doing the conversion, that it is not the database doing co-ordinate conversion but a system library, in which case it would be using its own proj4 definitions, which will be in a file on disc, not the ones in the database. But I haven't had any proj4 issues for some time now, so that seems doubtful. But thought I'd mention it.)

torotil commented 4 years ago

I don’t think the data has changed. I’m using the same zip-bundle from http://parlvid.mysociety.org/os/osni-dec-2015.tar.gz as always. What did change is the system I’m running this on. I think in python loading the data is done using gdal. Its ogrinfo command gives this output:

$ ogrinfo OSNI_Open_Data_Largescale_Boundaries__NI_Outline.shp -so OSNI_Open_Data_Largescale_Boundaries__NI_Outline
INFO: Open of `OSNI_Open_Data_Largescale_Boundaries__NI_Outline.shp'
      using driver `ESRI Shapefile' successful.

Layer name: OSNI_Open_Data_Largescale_Boundaries__NI_Outline
Geometry: Polygon
Feature Count: 1
Extent: (188538.219606, 309895.757933) - (366410.719254, 453202.854469)
Layer SRS WKT:
PROJCRS["TM65 / Irish Grid",
    BASEGEOGCRS["TM65",
        DATUM["TM65",
            ELLIPSOID["Airy Modified 1849",6377340.189,299.3249646,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4299]],
    CONVERSION["Irish Grid",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",53.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-8,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1.000035,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",200000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",250000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["unknown"],
        AREA["Ireland - onshore"],
        BBOX[51.39,-10.56,55.43,-5.93]],
    ID["EPSG",29902]]
Data axis to CRS axis mapping: 1,2
ID: Integer64 (10.0)
NAME: String (80.0)
Area_SqKM: Real (24.15)
OBJECTID: Integer64 (10.0)

I think this already hints at the problem: BBOX has it’s coordinates as north/east while the axis order is east/north.

torotil commented 4 years ago

I tried opening the NI outline file in qgis as well and there it’s displayed perfectly fine. AFAIK qgis uses the same set of libraries (gdal, proj4) for opening the shapefile. This is on another system. It shows the same data in ogrinfo though. So that’s not a hint after all.

torotil commented 4 years ago

I’ve tried a few more things today:

torotil commented 4 years ago

It seems for now I’m able to resolve it by not importing the OSNI dataset and simply raising its generation manually. This will still be a problem for new imports. I suspect it is some change in postgis 3 that’s affecting how the old data is imported.

torotil commented 4 years ago

I think I’ve found what switches the coordinate axis. It seems to be related to this change in gdal 3.

dracos commented 4 years ago

Just following this up - so do you know if there is some argument or something we should add to the import code to deal with this? I've read the link you gave but didn't really understand how to deal with the problem.

torotil commented 4 years ago

I couldn’t quite make out what changes were necessary to make this work properly. All I found was a ticket in the django bug-tracker that asks for documentation about how to deal with this issue.

torotil commented 4 years ago

django/django#12878 might contain more info about what needs to be done for correctly importing the data with GDAL 3.0.

dracos commented 4 years ago

Seems like from that the only option would be to upgrade to Django 3.1 when supported so that the framework then supports it. I realise https://github.com/mysociety/mapit/pull/358 is still outstanding, sorry, will hopefully be able to take a look in the near future. Django 1.11 officially supports postgis 2.1-2.3 and GDAL 1.9-2.1, Django 2.2 supports postgis 2.1-2.5 and GDAL 1.11-2.3, and Django 3.1 supports postgis 2.2-3.0 and GDAL 2.0-3.1.

torotil commented 3 years ago

I’m trying again to do a new import at the moment. This time with Django 3.1. I’ll update this issue when I have a result.

torotil commented 3 years ago

I still get the error on the first import of a fresh install.

Parent for Abbey [23479] (CED) was None, is now Cambridgeshire County Council [1988] (CTY)
Parent for Abbey [1235] (CED) was None, is now Cambridgeshire County Council [1988] (CTY)
Traceback (most recent call last):
  File "mapit/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/roman/mapit-build/.venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/home/roman/mapit-build/.venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/roman/mapit-build/.venv/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/roman/mapit-build/.venv/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/home/roman/mapit-build/mapit/mapit/management/find_parents.py", line 50, in handle
    raise Exception("Area %s does not have a parent?" % (self.pp_area(area)))
Exception: Area Abbey [229] (CED) does not have a parent?

This is with a fresh install of mapit v3.0 (current master), django 3.1.4, GDAL 3.0.4

dracos commented 3 years ago

That's not a NI area it's erroring on?

dracos commented 3 years ago

I'm not sure how you have two Abbeys in Cambridgeshire either, there is only one. My guess is the Abbey it has died on is Buckinghamshire, because that is CED but now has a UTA parent, which I included a fix for in https://github.com/mysociety/mapit/commit/0bb0aa74357f67eec007f9927e1480eaaa44c2ad so if you could check you are definitely running with that fix, and any other information you can provide, but I don't think (from what you have quoted) this is anything to do with whether it can find/not find parents of NI areas or GDAL versions.

torotil commented 3 years ago

Ok, shall I open up a new issue then to keep this clean?

I’m on that commit exactly (this commt has the v3.0 tag). I’ve had an error when importing boundaries earlier because at least one feature didn’t have a name (feat['NAME'].value was None in mapit_UK_import_boundary_line.py). I’ve solved that by setting the name to an empty string in this case.

torotil commented 3 years ago

I'm not sure how you have two Abbeys in Cambridgeshire either, there is only one.

I have also two Abbeys in Buckinghamshire Council. There seems to something wrong with the imported areas again.

dracos commented 3 years ago

"Ok, shall I open up a new issue then to keep this clean?" - this issue is about failing to import NI areas correctly with mismatched versions of Django/GDAL, as we tracked down. What you're talking about now is nothing to do with that, so yes, that should be a new ticket if you think there's a bug. And hopefully this ticket can be closed assuming that using supported matching versions of Django/GDAL don't have an issue with the NI areas.

Which feature didn't have a name? We've successfully run the import here, though not on an empty database, but it still looks at every name, so I can't see how we wouldn't have noticed that if you're running it on the same data. I also can't see how you'd end up with everything twice, perhaps you have passed the data in twice on the command line or something like that?

torotil commented 3 years ago

Thanks again for your patience with helping me along with my issues. I’ve posted a proper issue for the missing name: #369 and will post another one later for the duplicate areas / parent mismatch (with full logs so you can verify that I didn’t make any mistakes with the commands).

torotil commented 3 years ago

Seems assigning parents is working now with a clean import and a workaround for the empty feature name. So we can definitely close this issue.