minus34 / gnaf-loader

A quick way to get started with Geoscape's open GNAF & Admin Boundaries
Apache License 2.0
185 stars 66 forks source link

Missing table #10

Closed migurski closed 8 years ago

migurski commented 8 years ago

I just attempted to run the current version of this loader against the Feb 2016 GNAF files, and I’m seeing a lot of instances of this error late in the process:

SQL FAILED! : INSERT INTO gnaf.temp_state_upper_house_electorates_tags (gnaf_pid, gnaf_state, alias_principal, bdy_pid, bdy_name, bdy_state)
SELECT pnts.gnaf_pid,
       pnts.state,   
       'P',
       bdys.se_upper_pid,
       bdys.name,
       bdys.state
  FROM gnaf.address_principals AS pnts
  INNER JOIN admin_bdys.state_upper_house_electorates_analysis AS bdys
  ON ST_Within(pnts.geom, bdys.geom) WHERE bdys.gid > 1440 AND bdys.gid <= 2293;

…

: relation "admin_bdys.state_upper_house_electorates_analysis" does not exist
LINE 9:   INNER JOIN admin_bdys.state_upper_house_electorates_analys...

I’m curious if this is something you’ve encountered, or if it’s a bug? The script completes shortly after.

migurski commented 8 years ago

My schema looks like this, FYI:

Schema Name Type Owner Size
gnaf address_admin_boundaries table postgres 1121 MB
gnaf address_admin_boundaries_gid_seq sequence postgres 8192 bytes
gnaf address_alias_lookup table postgres 38 MB
gnaf address_aliases table postgres 130 MB
gnaf address_aliases_gid_seq sequence postgres 8192 bytes
gnaf address_principals table postgres 3318 MB
gnaf address_principals_gid_seq sequence postgres 8192 bytes
gnaf address_secondary_lookup table postgres 255 MB
gnaf addresses view gnaf2 0 bytes
gnaf localities table postgres 13 MB
gnaf localities_gid_seq sequence postgres 8192 bytes
gnaf locality_aliases table postgres 6024 kB
gnaf locality_neighbour_lookup table postgres 3856 kB
gnaf street_aliases table postgres 5520 kB
gnaf street_aliases_gid_seq sequence postgres 8192 bytes
gnaf streets table postgres 347 MB
gnaf streets_gid_seq sequence postgres 8192 bytes
public geography_columns view postgres 0 bytes
public geometry_columns view postgres 0 bytes
public raster_columns view postgres 0 bytes
public raster_overviews view postgres 0 bytes
public spatial_ref_sys table postgres 3216 kB
minus34 commented 8 years ago

Looks like the admin boundary creation process has failed, possibly due to a problem with the admin bdy folder setting (?).

Check the on screen log for "Part 2" of the process - if it's the folder setting, it'll say something like "No Admin Boundary files found"

If it's not can you share the full on screen log with me

migurski commented 8 years ago

Unfortunately, so many of these errors passed by that the original error was missing from the scrollback. Let me try this again, see if I can duplicate the problem.

migurski commented 8 years ago

I was able to duplicate the problem, but had the same issue with not being able to scroll back to the start. This is the code I’m working with, though: https://github.com/openaddresses/openaddresses/blob/5536d0fa2/scripts/au/gnaf.sh#L17-L31

andrewharvey commented 8 years ago

I was able to duplicate the problem, but had the same issue with not being able to scroll back to the start.

Can't it be logged to a file with ./script.sh > log.txt 2>&1 ?

migurski commented 8 years ago

Er, yes. Thanks for the reminder. Trying that now, should be done in a few hours.

migurski commented 8 years ago

The error begins at this point:

Part 4 of 4 : Start boundary tagging addresses : 2016-06-09 06:36:14.740135

Complete log at http://dpaste.com/16N9XG3#line-42

andrewhn commented 8 years ago

I'm having the same issue with the docker setup. Problem seems to be that the docker install installs GEOS 3.4.2, and so the st_subdivide_supported is not set to True. As such, a bunch of the _analysis tables are not created, hence the error.

migurski commented 8 years ago

Thanks; I could get it to install a different version, maybe using a newer Ubuntu? What version is required?

On Jun 9, 2016, at 5:24 PM, andrewhn notifications@github.com wrote:

I'm having the same issue with the docker setup. Problem seems to be that the docker install installs GEOS 3.4.2, and so the st_subdivide_supported is not set to True. As such, a bunch of the _analysis tables are not created, hence the error.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

andrewhn commented 8 years ago

Looking at the packages, Xenial should work ..

migurski commented 8 years ago

So, trying on Xenial I get this other error:

Part 2 of 4 : Start raw admin boundary load : 2016-06-10 05:55:54.107943
        - Step 1 of 3 : raw admin boundaries loaded : 0:01:01.252255
SQL FAILED! : --------------------------------------------------------------------------------------------
-- state electoral boundaries - choose bdys that will be current until at least 3 months from now
---------------------------------------------------------------------------------------------------

-- create lower house table
DROP TABLE IF EXISTS admin_bdys.state_lower_house_electorates CASCADE;
CREATE TABLE admin_bdys.state_lower_house_electorates AS
SELECT bdy.gid,
       tab.se_pid AS se_lower_pid,
       tab.name,
       tab.dt_gazetd,
       tab.eff_start, 
       tab.eff_end,
       aut.name_aut AS electorate_class,
       ste.st_abbrev AS state,
       bdy.geom
  FROM raw_admin_bdys.aus_state_electoral AS tab
  INNER JOIN raw_admin_bdys.aus_state_electoral_polygon AS bdy ON tab.se_pid = bdy.se_pid
  INNER JOIN raw_admin_bdys.aus_state AS ste ON tab.state_pid = ste.state_pid
  INNER JOIN raw_admin_bdys.aus_state_electoral_class_aut AS aut ON tab.secl_code = aut.code_aut
  WHERE (tab.eff_end > now() + interval '3 months'
    OR (tab.eff_start <= now() AND tab.eff_end IS NULL))
  AND tab.secl_code <> '3';

ALTER TABLE admin_bdys.state_lower_house_electorates ADD CONSTRAINT state_lower_house_electorates_pk PRIMARY KEY (gid);
CREATE INDEX state_lower_house_electorates_geom_idx ON admin_bdys.state_lower_house_electorates USING gist(geom);
ALTER TABLE admin_bdys.state_lower_house_electorates CLUSTER ON state_lower_house_electorates_geom_idx;

 : relation "raw_admin_bdys.aus_state_electoral" does not exist
LINE 17:   FROM raw_admin_bdys.aus_state_electoral AS tab
                ^

SQL FAILED! : -------------------------------------------------------------------------------
-- commonwealth electoral boundaries
--------------------------------------------------------------------------------------
migurski commented 8 years ago

(The previous version on Trusty actually works fine for OA purposes; unsure yet about this one since it’s still in progress)

migurski commented 8 years ago

Here is the complete log on Xenial; the process did not successfully write an output this time: http://dpaste.com/02DH3FV

minus34 commented 8 years ago

The first issue is due to the code having a bug in it that prevents it running correctly when the 'analysis' boundary tables haven't been created using the PostGIS 2.2 ST_Subdivide function. i.e when you have an older version of PostGIS installed.

The code's fixed and successfully tested with PostGIS 2.2. I don't have a PostGIS 2.1 test environment handy, if someone else could re-run part 4 of the process that'd be great (bearing in mind it takes several hours so PostGIS 2.2 really is recommended here)

minus34 commented 8 years ago

Re: the issue on Xenial. Can you share the list of tables in the raw_admin_bdys schema.

migurski commented 8 years ago

I am re-running it on Trusty with 2.1 now, will re-run on Xenial with 2.2 later.

migurski commented 8 years ago

Cancelled it under 2.1 after 7 hours—it definitely seems to be fixed, but ran much faster with the original bug! Re-running under Xenial now. Using FEB16 G-NAF data for both tests, if it makes a difference.

migurski commented 8 years ago

Seeing this error in Xenial, with a hard exit:

Part 3 of 4 : Start create reference tables : 2016-06-11 16:15:39.465431
    - Step  1 of 14 : create reference tables : 0:00:00.055313
    - Step  2 of 14 : localities populated : 0:00:00.788972
    - Step  3 of 14 : locality aliases populated : 0:00:00.943425
    - Step  4 of 14 : locality neighbours populated : 0:00:00.245822
    - Step  5 of 14 : streets populated : 0:00:17.309041
    - Step  6 of 14 : street aliases populated : 0:00:00.743530
    - Step  7 of 14 : addresses populated : 0:23:04.883648
    - Step  8 of 14 : principal alias lookup populated : 0:01:45.636932
    - Step  9 of 14 : primary secondary lookup populated : 0:02:07.245540
Traceback (most recent call last):
  File "/usr/local/gnaf-loader/load-gnaf.py", line 889, in <module>
    main()
  File "/usr/local/gnaf-loader/load-gnaf.py", line 233, in main
    create_reference_tables(pg_cur, settings)
  File "/usr/local/gnaf-loader/load-gnaf.py", line 592, in create_reference_tables
    pg_cur.execute(open_sql_file("03-10-reference-split-melbourne.sql", settings))
psycopg2.ProgrammingError: relation "admin_bdys.locality_bdys" does not exist
LINE 14:   FROM admin_bdys.locality_bdys AS bdy
migurski commented 8 years ago

Right now, the buggy version a4a8227 under Trusty works best for my use-case.

minus34 commented 8 years ago

Yes, the boundary tagging is slow with raw spaital tables. ST_Subdivide in PostGIS 2.2 cuts up the polygons to bring 5-10x performance improvements when using parallel processing. If you don't need bdy tags on GNAF for your work then yep ignore or comment Part 4 out.

minus34 commented 8 years ago

re: Xenial - there's something happening that looks like it's preventing the admin bdys from being created correctly. Haven't seen that behaviour in Ubuntu 14.04, Windows 7 & 10 and OS X 10.11

migurski commented 8 years ago

I’ve added a PR to make it possible to skip the fourth part without editing code.