minus34 / gnaf-loader

A quick way to get started with Geoscape's open GNAF & Admin Boundaries
Apache License 2.0
185 stars 66 forks source link

Key (street_locality_pid)=(SAL3622104) is duplicated in Aug20 #46

Closed andrewharvey closed 4 years ago

andrewharvey commented 4 years ago

Probably a data issue, so I'll see if I can manually tweak the source data as a workaround, but just a heads up that

root        : INFO     Part 4 of 6 : Start create reference tables : 2020-08-18 16:45:08.383685
root        : INFO      - Step  1 of 14 : create reference tables : 0:00:00.188041
root        : INFO      - Step  2 of 14 : localities populated : 0:00:00.647852
root        : INFO      - Step  3 of 14 : locality aliases populated : 0:00:00.464266
root        : INFO      - Step  4 of 14 : locality neighbours populated : 0:00:00.112384
Traceback (most recent call last):
  File "/usr/local/gnaf-loader/load-gnaf.py", line 967, in <module>
    if main():
  File "/usr/local/gnaf-loader/load-gnaf.py", line 132, in main
    create_reference_tables(pg_cur, settings)
  File "/usr/local/gnaf-loader/load-gnaf.py", line 626, in create_reference_tables
    pg_cur.execute(psma.open_sql_file("03-05-reference-populate-streets.sql", settings))
psycopg2.IntegrityError: could not create unique index "streets_pk"
DETAIL:  Key (street_locality_pid)=(SAL3622104) is duplicated.
minus34 commented 4 years ago

This issue is caused by a locality with 2 sets of coordinates instead of one for its centroid.

This code highlights the issue. There's no primary key at this stage of the process on the localities table, hence the problem occurs downstream on the street locality table

SELECT loc.locality_pid, loc.locality_name, loc.primary_postcode AS postcode, st.state_abbreviation AS state, pnt.latitude, pnt.longitude, aut.name AS locality_class, loc.gnaf_reliability_code, st_setsrid(st_makepoint(pnt.longitude, pnt.latitude), 4283) AS geom FROM raw_gnaf_202008.locality AS loc INNER JOIN raw_gnaf_202008.state AS st ON loc.state_pid = st.state_pid INNER JOIN raw_gnaf_202008.locality_class_aut AS aut ON loc.locality_class_code = aut.code LEFT OUTER JOIN raw_gnaf_202008.locality_point AS pnt ON loc.locality_pid = pnt.locality_pid where loc.locality_pid = '500223830' ORDER BY st.state_abbreviation, loc.locality_name, loc.primary_postcode;

Fix is to log a support request with PSMA. I can do that.

Workaround is to do a group by on the code above (in the postgres-scripts/03-02-reference-populate-localities.sql file) and use average coords

minus34 commented 4 years ago

Workaround is now in the 202008 branch - starting a test run...

minus34 commented 4 years ago

Test run successful - merging branch

andrewharvey commented 4 years ago

Thanks Hugh.