minus34 / gnaf-loader

A quick way to get started with Geoscape's open GNAF & Admin Boundaries
Apache License 2.0
182 stars 66 forks source link

Error at File "gnaf-loader/load-gnaf.py", line 513, in prep_admin_bdys #29

Closed andrewharvey closed 6 years ago

andrewharvey commented 6 years ago

I'm trying to update openaddresses with the FEB18 release but when I run from a clean install of Debian 9 with the following setup:

apt update && apt install byobu git curl zip unzip parallel python-psycopg2 postgresql-9.6 postgresql-9.6-postgis-2.3 postgis

git clone https://github.com/minus34/gnaf-loader.git

TMP=/tmp/work
mkdir $TMP
mkdir $TMP/gnaf $TMP/gnaf-admin $TMP/tablespace
chown postgres:postgres $TMP/tablespace

sudo -u postgres psql -c "CREATE USER gnafun WITH SUPERUSER PASSWORD 'gnafpw'"
sudo -u postgres psql -c "CREATE TABLESPACE gnafts OWNER gnafun LOCATION '$TMP/tablespace'"
sudo -u postgres psql -c 'CREATE DATABASE gnafdb OWNER gnafun TABLESPACE gnafts'
sudo -u postgres psql -c 'CREATE EXTENSION postgis'

curl -s --retry 10 --location 'https://data.gov.au/dataset/bdcf5b09-89bc-47ec-9281-6b8e9ee147aa/resource/53c24b8e-4f55-4eed-a189-2fc0dcca6381/download/feb18adminboundsesrishapefileordbffile20180219141148.zip' -o $TMP/gnaf-admin.zip &
curl -s --retry 10 --location 'https://data.gov.au/dataset/19432f89-dc3a-4ef3-b943-5326ef1dbecc/resource/4b084096-65e4-4c8e-abbe-5e54ff85f42f/download/feb18gnafpipeseparatedvalue20180219141901.zip' -o $TMP/gnaf.zip &
wait
parallel "unzip -d $TMP/{} $TMP/{}.zip" ::: gnaf gnaf-admin

GNAF_DIR="$(find $TMP -type d | grep 'G-NAF' | grep 'Authority Code' | xargs -I {} dirname {} | head -n1)"
BOUNDARY_DIR="$(find $TMP -type d | grep 'Administrative Boundaries' | head -n1 | xargs -I {} dirname {})"

# /tmp/work/gnaf/FEB18_GNAF_PipeSeparatedValue_20180219141901/G-NAF/G-NAF FEBRUARY 2018
# /tmp/work/gnaf-admin/FEB18_AdminBounds_ESRIShapefileorDBFfile_20180219141148

# change local all all connection to md5 to accept password connections locally
service postgresql restart

python gnaf-loader/load-gnaf.py --pguser gnafun --pgdb gnafdb --pgpassword gnafpw --gnaf-schema gnaf --gnaf-tables-path "$GNAF_DIR" --admin-bdys-path "$BOUNDARY_DIR" --raw-unlogged --no-boundary-tag

I get the following output:

root        : INFO
root        : INFO     Start gnaf-loader
root        : INFO      - running Python 2.7.13 with Psycopg2 2.6.2
root        : INFO      - on Linux #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18)
root        : INFO      - using Postgres 9.6.6 on x86_64-pc-linux-gnu and PostGIS 2.3.1 (with GEOS 3.5.1-CAPI-1.9.1)
root        : INFO
root        : INFO     Part 1 of 4 : Start raw GNAF load : 2018-02-28 18:08:09.408937
root        : INFO      - Step 1 of 7 : tables dropped : 0:00:00.069420
root        : INFO      - Step 2 of 7 : database NOT vacuumed
root        : INFO      - Step 3 of 7 : UNLOGGED tables created : 0:00:00.019004
root        : INFO              - Loading state ACT
root        : INFO              - Loading state NSW
root        : INFO              - Loading state NT
root        : INFO              - Loading state OT
root        : INFO              - Loading state QLD
root        : INFO              - Loading state SA
root        : INFO              - Loading state TAS
root        : INFO              - Loading state VIC
root        : INFO              - Loading state WA
root        : INFO      - Step 4 of 7 : tables populated : 0:01:28.195092
root        : INFO      - Step 5 of 7 : indexes created: 0:03:00.082667
root        : INFO      - Step 6 of 7 : primary & foreign keys NOT created
root        : INFO      - Step 7 of 7 : tables analysed : 0:00:04.827586
root        : INFO     Part 1 of 4 : Raw GNAF loaded! : 0:04:33.199275
root        : INFO
root        : INFO     Part 2 of 4 : Start raw admin boundary load : 2018-02-28 18:12:42.608401
root        : INFO      - Step 1 of 3 : raw admin boundaries loaded : 0:06:51.062249
Traceback (most recent call last):
  File "gnaf-loader/load-gnaf.py", line 940, in <module>
    if main():
  File "gnaf-loader/load-gnaf.py", line 90, in main
    prep_admin_bdys(pg_cur, settings)
  File "gnaf-loader/load-gnaf.py", line 513, in prep_admin_bdys
    .format(settings['admin_bdys_schema'], settings['pg_user']))
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block

Looking at the database the raw_ schema and tables are all there, but the non-raw schema doesn't exist and I can't work out why it's not being created. Any ideas?

This is the postgresql log:

2018-02-28 18:19:33.668 AEDT [5607] gnafun@gnafdb ERROR:  current transaction is aborted, commands ignored until end of transaction block
2018-02-28 18:19:33.668 AEDT [5607] gnafun@gnafdb STATEMENT:  SET CLIENT_ENCODING TO UTF8;
        SET STANDARD_CONFORMING_STRINGS TO ON;
        BEGIN;
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('1','AGRICULTURAL',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('2','COMMERCIAL',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('3','EDUCATION',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('4','HOSPITAL/MEDICAL',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('5','INDUSTRIAL',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('6','NOUSUALRESIDENCE',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('7','PARKLAND',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('8','RESIDENTIAL',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('9','SHIPPING',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('10','TRANSPORT',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('11','WATER',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('12','OTHER',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('13','ANTARCTICA',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('14','MIGRATORY',NULL);
        INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('15','OFFSHORE',NULL);
        COMMIT;
        ANALYZE "raw_admin_bdys_201802"."aus_mb_category_class_aut";

2018-02-28 18:19:33.679 AEDT [5607] gnafun@gnafdb ERROR:  current transaction is aborted, commands ignored until end of transaction block
2018-02-28 18:19:33.679 AEDT [5607] gnafun@gnafdb STATEMENT:  CREATE SCHEMA IF NOT EXISTS admin_bdys_201802 AUTHORIZATION gnafun
minus34 commented 6 years ago

I haven't encountered that before - possibly a PG memory issue?

What happens if you comment out lines 68 to 89 (e.g. the lines below), restart the database service and then try again from that point?

PART 1 - load gnaf from PSV files

logger.info("")

start_time = datetime.now()

logger.info("Part 1 of 4 : Start raw GNAF load : {0}".format(start_time))

drop_tables_and_vacuum_db(pg_cur, settings)

create_raw_gnaf_tables(pg_cur, settings)

populate_raw_gnaf(settings)

index_raw_gnaf(settings)

if settings['primary_foreign_keys']:

create_primary_foreign_keys(settings)

else:

logger.info("\t- Step 6 of 7 : primary & foreign keys NOT created")

analyse_raw_gnaf_tables(pg_cur, settings)

set postgres search path back to the default

pg_cur.execute("SET search_path = public, pg_catalog")

logger.info("Part 1 of 4 : Raw GNAF loaded! :

{0}".format(datetime.now() - start_time)) #

PART 2 - load raw admin boundaries from Shapefiles

logger.info("")

start_time = datetime.now()

logger.info("Part 2 of 4 : Start raw admin boundary load :

{0}".format(start_time))

load_raw_admin_boundaries(pg_cur, settings)

prep_admin_bdys(pg_cur, settings) create_admin_bdys_for_analysis(settings) logger.info("Part 2 of 4 : Raw admin boundaries loaded! : {0}".format(datetime.now() - start_time))

On 28 February 2018 at 18:23, Andrew Harvey notifications@github.com wrote:

I'm trying to update openaddresses with the FEB18 release but when I run from a clean install of Debian 9 with the following setup:

apt update && apt install byobu git curl zip unzip parallel python-psycopg2 postgresql-9.6 postgresql-9.6-postgis-2.3 postgis

git clone https://github.com/minus34/gnaf-loader.git

TMP=/tmp/work mkdir $TMP mkdir $TMP/gnaf $TMP/gnaf-admin $TMP/tablespace chown postgres:postgres $TMP/tablespace

sudo -u postgres psql -c "CREATE USER gnafun WITH SUPERUSER PASSWORD 'gnafpw'" sudo -u postgres psql -c "CREATE TABLESPACE gnafts OWNER gnafun LOCATION '$TMP/tablespace'" sudo -u postgres psql -c 'CREATE DATABASE gnafdb OWNER gnafun TABLESPACE gnafts' sudo -u postgres psql -c 'CREATE EXTENSION postgis'

curl -s --retry 10 --location 'https://data.gov.au/dataset/bdcf5b09-89bc-47ec-9281-6b8e9ee147aa/resource/53c24b8e-4f55-4eed-a189-2fc0dcca6381/download/feb18adminboundsesrishapefileordbffile20180219141148.zip' -o $TMP/gnaf-admin.zip & curl -s --retry 10 --location 'https://data.gov.au/dataset/19432f89-dc3a-4ef3-b943-5326ef1dbecc/resource/4b084096-65e4-4c8e-abbe-5e54ff85f42f/download/feb18gnafpipeseparatedvalue20180219141901.zip' -o $TMP/gnaf.zip &wait parallel "unzip -d $TMP/{} $TMP/{}.zip" ::: gnaf gnaf-admin

GNAF_DIR="$(find $TMP -type d | grep 'G-NAF' | grep 'Authority Code' | xargs -I {} dirname {} | head -n1)" BOUNDARY_DIR="$(find $TMP -type d | grep 'Administrative Boundaries' | head -n1 | xargs -I {} dirname {})"

/tmp/work/gnaf/FEB18_GNAF_PipeSeparatedValue_20180219141901/G-NAF/G-NAF FEBRUARY 2018# /tmp/work/gnaf-admin/FEB18_AdminBounds_ESRIShapefileorDBFfile_20180219141148

change local all all connection to md5 to accept password connections locally

service postgresql restart

python gnaf-loader/load-gnaf.py --pguser gnafun --pgdb gnafdb --pgpassword gnafpw --gnaf-schema gnaf --gnaf-tables-path "$GNAF_DIR" --admin-bdys-path "$BOUNDARY_DIR" --raw-unlogged --no-boundary-tag

I get the following output:

root : INFO root : INFO Start gnaf-loader root : INFO - running Python 2.7.13 with Psycopg2 2.6.2 root : INFO - on Linux #1 SMP Debian 4.9.30-2+deb9u1 (2017-06-18) root : INFO - using Postgres 9.6.6 on x86_64-pc-linux-gnu and PostGIS 2.3.1 (with GEOS 3.5.1-CAPI-1.9.1) root : INFO root : INFO Part 1 of 4 : Start raw GNAF load : 2018-02-28 18:08:09.408937 root : INFO - Step 1 of 7 : tables dropped : 0:00:00.069420 root : INFO - Step 2 of 7 : database NOT vacuumed root : INFO - Step 3 of 7 : UNLOGGED tables created : 0:00:00.019004 root : INFO - Loading state ACT root : INFO - Loading state NSW root : INFO - Loading state NT root : INFO - Loading state OT root : INFO - Loading state QLD root : INFO - Loading state SA root : INFO - Loading state TAS root : INFO - Loading state VIC root : INFO - Loading state WA root : INFO - Step 4 of 7 : tables populated : 0:01:28.195092 root : INFO - Step 5 of 7 : indexes created: 0:03:00.082667 root : INFO - Step 6 of 7 : primary & foreign keys NOT created root : INFO - Step 7 of 7 : tables analysed : 0:00:04.827586 root : INFO Part 1 of 4 : Raw GNAF loaded! : 0:04:33.199275 root : INFO root : INFO Part 2 of 4 : Start raw admin boundary load : 2018-02-28 18:12:42.608401 root : INFO - Step 1 of 3 : raw admin boundaries loaded : 0:06:51.062249 Traceback (most recent call last): File "gnaf-loader/load-gnaf.py", line 940, in if main(): File "gnaf-loader/load-gnaf.py", line 90, in main prep_admin_bdys(pg_cur, settings) File "gnaf-loader/load-gnaf.py", line 513, in prep_admin_bdys .format(settings['admin_bdys_schema'], settings['pg_user'])) psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block

Looking at the database the raw_ schema and tables are all there, but the non-raw schema doesn't exist and I can't work out why it's not being created. Any ideas?

This is the postgresql log:

2018-02-28 18:19:33.668 AEDT [5607] gnafun@gnafdb ERROR: current transaction is aborted, commands ignored until end of transaction block 2018-02-28 18:19:33.668 AEDT [5607] gnafun@gnafdb STATEMENT: SET CLIENT_ENCODING TO UTF8; SET STANDARD_CONFORMING_STRINGS TO ON; BEGIN; INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('1','AGRICULTURAL',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('2','COMMERCIAL',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('3','EDUCATION',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('4','HOSPITAL/MEDICAL',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('5','INDUSTRIAL',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('6','NOUSUALRESIDENCE',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('7','PARKLAND',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('8','RESIDENTIAL',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('9','SHIPPING',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('10','TRANSPORT',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('11','WATER',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('12','OTHER',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('13','ANTARCTICA',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('14','MIGRATORY',NULL); INSERT INTO "raw_admin_bdys_201802"."aus_mb_category_class_aut" ("code","name","descriptio") VALUES ('15','OFFSHORE',NULL); COMMIT; ANALYZE "raw_admin_bdys_201802"."aus_mb_category_class_aut";

2018-02-28 18:19:33.679 AEDT [5607] gnafun@gnafdb ERROR: current transaction is aborted, commands ignored until end of transaction block 2018-02-28 18:19:33.679 AEDT [5607] gnafun@gnafdb STATEMENT: CREATE SCHEMA IF NOT EXISTS admin_bdys_201802 AUTHORIZATION gnafun

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/minus34/gnaf-loader/issues/29, or mute the thread https://github.com/notifications/unsubscribe-auth/ABucvpQrubC6d2dJWlXNY4qFtKdkDsyXks5tZP73gaJpZM4SWMO8 .

andrewharvey commented 6 years ago

thanks Hugh.

I did run into postgres crashing due to running out of memory before this but the postgres log showed it was killed, I fixed that by adding a larger swap and pg no longer crashes.

Then I get the error above, every time I run it, even on different machines.

However If I do as you suggest comment out those lines and start it again, then it works. I still can't work out why it's failing initially...

PS. start_time = datetime.now() needs to be left uncommented

I tried reducing max processors to 1 but it didn't change anything.

minus34 commented 6 years ago

Strange behaviour from PG. Not sure there's a code fix for this other than to restart the PG service inside the GNAF Loader process, which won't suit everyone.

On 1 March 2018 at 23:13, Andrew Harvey notifications@github.com wrote:

thanks Hugh.

I did run into postgres crashing due to running out of memory before this but the postgres log showed it was killed, I fixed that by adding a larger swap and pg no longer crashes.

Then I get the error above, every time I run it, even on different machines.

However If I do as you suggest comment out those lines and start it again, then it works. I still can't work out why it's failing initially...

PS. start_time = datetime.now() needs to be left uncommented

I tried reducing max processors to 1 but it didn't change anything.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/minus34/gnaf-loader/issues/29#issuecomment-369573359, or mute the thread https://github.com/notifications/unsubscribe-auth/ABucvrmEMMjUlar5ANzXbWjKXy0PXYnQks5tZ-YDgaJpZM4SWMO8 .

andrewharvey commented 6 years ago

Postgresql 10.3 has a few things to say about the search_path but I don't think that's the issue. I'll try to do more testing to see if I can nail it down to a specific OS/ set of package versions.

Document how to configure installations and applications to guard against search-path-dependent trojan-horse attacks from other users (Noah Misch)

Using a search_path setting that includes any schemas writable by a hostile user enables that user to capture control of queries and then run arbitrary SQL code with the permissions of the attacked user. While it is possible to write queries that are proof against such hijacking, it is notationally tedious, and it's very easy to overlook holes. Therefore, we now recommend configurations in which no untrusted schemas appear in one's search path. Relevant documentation appears in Section 5.8.6 (for database administrators and users), Section 33.1 (for application authors), Section 37.15.1 (for extension authors), and CREATE FUNCTION (for authors of SECURITY DEFINER functions). (CVE-2018-1058)

Avoid use of insecure search_path settings in pg_dump and other client programs (Noah Misch, Tom Lane)

pg_dump, pg_upgrade, vacuumdb and other PostgreSQL-provided applications were themselves vulnerable to the type of hijacking described in the previous changelog entry; since these applications are commonly run by superusers, they present particularly attractive targets. To make them secure whether or not the installation as a whole has been secured, modify them to include only the pg_catalog schema in their search_path settings. Autovacuum worker processes now do the same, as well.

In cases where user-provided functions are indirectly executed by these programs — for example, user-provided functions in index expressions — the tighter search_path may result in errors, which will need to be corrected by adjusting those user-provided functions to not assume anything about what search path they are invoked under. That has always been good practice, but now it will be necessary for correct behavior. (CVE-2018-1058)

skellert0 commented 6 years ago

Hi -just a short note to say I have almost the same (if not the same) issue. Alas, because I'm a total noob to postgres, I can't really describe the problem as well as the earlier poster. However I can paste the method by which I struck this issue. Happy to try and gather more information if you can guide me.

I'm using a MacBook pro with 16G of ram. I have successfully run earlier versions of this script and earlier copies of the GNAF and administrative boundaries trees on same laptop. The problem is with the feb 2018 data and script.

Here's how I invoked the script and its output (Formatting may be screwed up, sorry) `%GNAF_TABLES="/Users/stefan/Documents/src/GNAF/Feb_2018/FEB18_GNAF/G-NAF/G-NAF FEBRUARY 2018"

ADMIN_BDYS="/Users/stefan/Documents/src/GNAF/Feb_2018"

/usr/local/bin/python3 gnaf-loader-master/load-gnaf.py --pguser stefan --pgdb stefan --gnaf-tables-path="${GNAF_TABLES}" --admin-bdys-path="${ADMIN_BDYS}" --no-boundary-tag root : INFO
root : INFO Start gnaf-loader root : INFO - running Python 3.6.4 with Psycopg2 2.7.3 root : INFO - on Darwin Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 root : INFO - using Postgres 10.2 on x86_64-apple-darwin17.3.0 and PostGIS 2.4.3 (with GEOS 3.6.2-CAPI-1.10.2) root : INFO
root : INFO Part 1 of 4 : Start raw GNAF load : 2018-03-02 16:00:57.187267 root : INFO - Step 1 of 7 : tables dropped : 0:00:03.327904 root : INFO - Step 2 of 7 : database NOT vacuumed root : INFO - Step 3 of 7 : tables created : 0:00:00.025727 root : INFO - Loading state ACT root : INFO - Loading state NSW root : INFO - Loading state NT root : INFO - Loading state OT root : INFO - Loading state QLD root : INFO - Loading state SA root : INFO - Loading state TAS root : INFO - Loading state VIC root : INFO - Loading state WA root : INFO - Step 4 of 7 : tables populated : 0:01:43.319920 root : INFO - Step 5 of 7 : indexes created: 0:06:36.051158 root : INFO - Step 6 of 7 : primary & foreign keys NOT created root : INFO - Step 7 of 7 : tables analysed : 0:00:02.895539 root : INFO Part 1 of 4 : Raw GNAF loaded! : 0:08:25.639960 root : INFO
root : INFO Part 2 of 4 : Start raw admin boundary load : 2018-03-02 16:09:22.827582 root : INFO - Step 1 of 3 : raw admin boundaries loaded : 0:05:42.289343 Traceback (most recent call last): File "gnaf-loader-master/load-gnaf.py", line 940, in if main(): File "gnaf-loader-master/load-gnaf.py", line 90, in main prep_admin_bdys(pg_cur, settings) File "gnaf-loader-master/load-gnaf.py", line 513, in prep_admin_bdys .format(settings['admin_bdys_schema'], settings['pg_user'])) psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block

tomoliveri commented 6 years ago

I'm facing the same issue, Ubuntu 17.10 with postgres9.6

Did you guys do the postgres optimizations (I did), or are you running vanila?

skellert0 commented 6 years ago

The only change I've made to standard postgres configuration is to set max_wal_size to 4Gbyte (on OSX/Mac)

minus34 commented 6 years ago

I re-ran on my tuned instance of PG 9.6.8 last night on MacOS and it worked fine. Will test PG 10.3 tonight.

skellert0 commented 6 years ago

Thanks. What do you mean by 'tuned'?

tomoliveri commented 6 years ago

The optimizations ala http://revenant.ca/www/postgis/workshop/tuning.html

minus34 commented 6 years ago

Yep, just note @skellert0 you can go a lot bigger on the memory settings. That guide is old. There hasn't been a refresh for some years.

skellert0 commented 6 years ago

Thanks for the link and comment about the memory settings. I increased the sizes of the parameters mentioned in the link and more generously. I've found the behaviour of the gnaf-loader script the same as before. It will be interesting to see if @minus34 finds the script works tonight in that environment.

minus34 commented 6 years ago

I couldn't replicate on my desktop (MacOS High Sierra). PG 10.3 was fine. Also ran again on PG 9.6.8 using your PG user and db setup @andrewharvey without error. FYI - here's the start of my log for my version numbers used.

root        : INFO     Start gnaf-loader
root        : INFO      - running Python 3.6.4 with Psycopg2 2.7.4
root        : INFO      - on Darwin Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64
root        : INFO      - using Postgres 9.6.8 on x86_64-apple-darwin14.5.0 and PostGIS 2.3.6 (with GEOS 3.5.1-CAPI-1.9.1)
skellert0 commented 6 years ago

Thanks hugh. I have deleted my database. Reinitialised and run again. I get the same results as first up (ie fails with the same log I included above). I notice that my Psycopg2 is one minor revision lower than yours, and my postgres and postgis are one major revision higher - these are the OSX versions that are installed with the current 'brew upgrade'.

There's clearly a problem with something - eg environment? But I am somewhat of a noob for both postgres and sql. Have you any suggestions on how to dig a little deeper and get some better hints about the nature of the problem? For example, is there a way to enable debugging output as the gnaf-loader-master runs to help provide more details?

tomoliveri commented 6 years ago

I've tried:

Postgres 9.6 Postgres 10.1 Python 2.7 Python3.6.3 Psycopg2 2.7.3 Psycopg2 2.7.4 PostGIS 2.4.2

Same error for all. I'm out of ideas.

minus34 commented 6 years ago

BTW - if you just want the thing to run to completion while we're trying to work this issue out - you can follow my instructions to Andrew above (noting his correction to my code).

Another option would be to create the admin_bdys_201802 schema before running load-gnaf.py. This should prevent the error from occurring.

tomoliveri commented 6 years ago

Cheers Hugh, I'm not sure why I decided to skip that part of our discussion! It's imported now.

An additional note, the process hangs for a long time when doing that step, just encase that helps you troubleshoot it at all.

minus34 commented 6 years ago

I don't think it's related - but it's worth mentioning I'm using the new psycopg2-binary module, not psycopg2 (which is being deprecated due to its legacy build process)

skellert0 commented 6 years ago

Been away for a couple of days. @minus34, took your advice and ran the modified script described in post above. Certainly ran without crashing, but there were some complaints per below. I've not worked out whether these materially affect the output though.

%/usr/local/bin/python3 gnaf-loader-master/load-gnaf.py --pguser stefan --pgdb stefan --gnaf-tables-path="${GNAF_TABLES}" --admin-bdys-path="${ADMIN_BDYS}" --no-boundary-tag root : INFO
root : INFO Start gnaf-loader root : INFO - running Python 3.6.4 with Psycopg2 2.7.3 root : INFO - on Darwin Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 root : INFO - using Postgres 10.2 on x86_64-apple-darwin17.3.0 and PostGIS 2.4.3 (with GEOS 3.6.2-CAPI-1.10.2) root : INFO - Step 2 of 3 : admin boundaries prepped : 0:00:17.070528 root : INFO - Step 3 of 3 : admin boundaries for analysis created : 0:00:11.404417 root : INFO Part 2 of 5 : Raw admin boundaries loaded! : 0:00:28.476516 root : INFO
root : INFO Part 3 of 5 : Start create reference tables : 2018-03-09 11:54:14.008804 root : INFO - Step 1 of 14 : create reference tables : 0:00:00.043987 root : INFO - Step 2 of 14 : localities populated : 0:00:00.481711 root : INFO - Step 3 of 14 : locality aliases populated : 0:00:00.683614 root : INFO - Step 4 of 14 : locality neighbours populated : 0:00:00.122082 root : INFO - Step 5 of 14 : streets populated : 0:00:21.459558 root : INFO - Step 6 of 14 : street aliases populated : 0:00:01.204371 root : INFO - Step 7 of 14 : addresses populated : 0:03:12.721945 root : INFO - Step 8 of 14 : principal alias lookup populated : 0:00:22.587675 root : INFO - Step 9 of 14 : primary secondary lookup populated : 0:00:21.416747 root : INFO - Step 10 of 14 : Melbourne split : 0:00:00.361311 objc[3668]: Class FIFinderSyncExtensionHost is implemented in both /System/Library/PrivateFrameworks/FinderKit.framework/Versions/A/FinderKit (0x7fffa993ab68) and /System/Library/PrivateFrameworks/FileProvider.framework/OverrideBundles/FinderSyncCollaborationFileProviderOverride.bundle/Contents/MacOS/FinderSyncCollaborationFileProviderOverride (0x127ed6cd8). One of the two will be used. Which one is undefined. root : INFO - Step 11 of 14 : localities finalised : 0:08:06.042961 root : INFO - Step 12 of 14 : addresses finalised : 0:00:55.021064 root : INFO - Step 13 of 14 : postcode boundaries created : 0:00:30.598694 root : INFO SQL FAILED! : ALTER TABLE ONLY gnaf_201802.address_alias_lookup ADD CONSTRAINT address_alias_lookup_fk1 FOREIGN KEY (alias_pid) REFERENCES gnaf_201802.address_aliases(gnaf_pid); : insert or update on table "address_alias_lookup" violates foreign key constraint "address_alias_lookup_fk1" DETAIL: Key (alias_pid)=(GAVIC424688461) is not present in table "address_aliases".

root : INFO SQL FAILED! : ALTER TABLE ONLY gnaf_201802.address_alias_lookup ADD CONSTRAINT address_alias_lookup_fk2 FOREIGN KEY (principal_pid) REFERENCES gnaf_201802.address_principals(gnaf_pid); : insert or update on table "address_alias_lookup" violates foreign key constraint "address_alias_lookup_fk2" DETAIL: Key (principal_pid)=(GAVIC411816861) is not present in table "address_principals".

root : INFO - Step 14 of 14 : create primary & foreign keys and indexes : 0:11:15.522722 root : INFO Part 3 of 5 : Reference tables created! : 0:25:08.274259 root : INFO
root : WARNING Part 4 of 5 : Addresses NOT boundary tagged root : INFO
root : INFO Part 5 of 5 : Start row counts : 2018-03-09 12:19:22.283456 root : INFO - Step 1 of 2 : got row counts for gnaf_201802 schema : 0:00:52.823611 root : INFO - Step 2 of 2 : got row counts for admin_bdys_201802 schema : 0:00:53.826644 root : INFO
root : INFO Part 5 of 5 : Got row counts : 0:00:53.827079 root : INFO
root : INFO Total time : : 0:26:30.949133 root : INFO Finished successfully! root : INFO
root : INFO -------------------------------------------------------------------------------

skellert0 commented 6 years ago

Hmmm. Further to post above, I have made an unexpected discovery, which may help shed some more light on the nature of the problems, @minus34. Although as I'm still a noob, I can't recognise what light it sheds!

Instead of using the Feb 2018 Administrative Boundaries data set, I used the August 2017 Administrative Boundaries dataset. Running the gnaf-loader-master script downloaded afresh this morning from github, unmodified, runs to completion with no errors or warnings, using the same versions of software as have failed with the Feb 2018 Admin Boundaries data set! This seems to prove there's something different between Admin Boundaries data sets which is provoking the crash / warnings reported above. Hopefully this also helps point to a more robust solution.

%/usr/local/bin/python3 gnaf-loader-master/load-gnaf.py --pguser stefan --pgdb stefan --gnaf-tables-path="${GNAF_TABLES}" --admin-bdys-path="${ADMIN_BDYS}" --no-boundary-tag root : INFO
root : INFO Start gnaf-loader root : INFO - running Python 3.6.4 with Psycopg2 2.7.3 root : INFO - on Darwin Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 root : INFO - using Postgres 10.2 on x86_64-apple-darwin17.3.0 and PostGIS 2.4.3 (with GEOS 3.6.2-CAPI-1.10.2) root : INFO
root : INFO Part 1 of 5 : Start raw GNAF load : 2018-03-09 12:37:32.759534 root : INFO - Step 1 of 7 : tables dropped : 0:00:00.012114 root : INFO - Step 2 of 7 : database NOT vacuumed root : INFO - Step 3 of 7 : tables created : 0:00:00.025250 root : INFO - Loading state ACT root : INFO - Loading state NSW root : INFO - Loading state NT root : INFO - Loading state OT root : INFO - Loading state QLD root : INFO - Loading state SA root : INFO - Loading state TAS root : INFO - Loading state VIC root : INFO - Loading state WA root : INFO - Step 4 of 7 : tables populated : 0:01:40.616255 root : INFO - Step 5 of 7 : indexes created: 0:06:56.920551 root : INFO - Step 6 of 7 : primary & foreign keys NOT created root : INFO - Step 7 of 7 : tables analysed : 0:00:02.985676 root : INFO Part 1 of 5 : Raw GNAF loaded! : 0:08:40.564524 root : INFO
root : INFO Part 2 of 5 : Start raw admin boundary load : 2018-03-09 12:46:13.324237 root : INFO - Step 1 of 3 : raw admin boundaries loaded : 0:07:35.741771 root : INFO - Step 2 of 3 : admin boundaries prepped : 0:00:32.208814 root : INFO - Step 3 of 3 : admin boundaries for analysis created : 0:00:20.372890 root : INFO Part 2 of 5 : Raw admin boundaries loaded! : 0:08:28.329560 root : INFO
root : INFO Part 3 of 5 : Start create reference tables : 2018-03-09 12:54:41.654123 root : INFO - Step 1 of 14 : create reference tables : 0:00:00.032627 root : INFO - Step 2 of 14 : localities populated : 0:00:00.441625 root : INFO - Step 3 of 14 : locality aliases populated : 0:00:00.630461 root : INFO - Step 4 of 14 : locality neighbours populated : 0:00:00.115713 root : INFO - Step 5 of 14 : streets populated : 0:00:22.861609 root : INFO - Step 6 of 14 : street aliases populated : 0:00:01.186764 root : INFO - Step 7 of 14 : addresses populated : 0:03:04.902197 root : INFO - Step 8 of 14 : principal alias lookup populated : 0:00:22.254218 root : INFO - Step 9 of 14 : primary secondary lookup populated : 0:00:22.108405 root : INFO - Step 10 of 14 : Melbourne split : 0:00:21.412727 root : INFO - Step 11 of 14 : localities finalised : 0:07:29.556919 root : INFO - Step 12 of 14 : addresses finalised : 0:00:57.130173 root : INFO - Step 13 of 14 : postcode boundaries created : 0:00:57.109154 root : INFO - Step 14 of 14 : create primary & foreign keys and indexes : 0:12:04.627185 root : INFO Part 3 of 5 : Reference tables created! : 0:26:04.381789 root : INFO
root : WARNING Part 4 of 5 : Addresses NOT boundary tagged root : INFO
root : INFO Part 5 of 5 : Start row counts : 2018-03-09 13:20:46.036309 root : INFO - Step 1 of 2 : got row counts for gnaf_201802 schema : 0:00:52.304610 root : INFO - Step 2 of 2 : got row counts for admin_bdys_201802 schema : 0:00:54.201355 root : INFO
root : INFO Part 5 of 5 : Got row counts : 0:00:54.201992 root : INFO
root : INFO Total time : : 0:44:07.542250 root : INFO Finished successfully! root : INFO
root : INFO -------------------------------------------------------------------------------

cndrla commented 6 years ago

Hi

I too have been using the script without a problem until this February 2018 release.

Same problem, falls over at line 513, definitely memory related on my machine:

2018-03-21 11:29:17 AEDT ERROR: out of memory 2018-03-21 11:29:17 AEDT DETAIL: Failed on request of size 2097152.

I'm still trying to track down why, along with tweaking PG memory config.

cndrla commented 6 years ago

I did get this to work.

I commented out each part (1-5) and only ran one part at a time.

Also, in the case of part 2, I ran these three functions individually in order by alternately commenting out the other two:

    load_raw_admin_boundaries(pg_cur, settings)
    prep_admin_bdys(pg_cur, settings)
    create_admin_bdys_for_analysis(settings)

In addition, I specified individual states via the --states command line parameter and processed each state individually for the load_raw_admin_boundaries function.

I also had to set the max_processes to 1.

Only that combination allowed the overall process to complete without falling over, albeit with the following errors during part 3:

2018-03-22 14:59:10 AEDT ERROR:  insert or update on table "address_alias_lookup" violates foreign key constraint "address_alias_lookup_fk1"
2018-03-22 14:59:10 AEDT DETAIL:  Key (alias_pid)=(GAVIC425540485) is not present in table "address_aliases".
2018-03-22 14:59:10 AEDT STATEMENT:  ALTER TABLE ONLY gnaf.address_alias_lookup ADD CONSTRAINT address_alias_lookup_fk1 FOREIGN KEY (alias_pid) REFERENCES gnaf.address_aliases(gnaf_pid);
2018-03-22 14:59:18 AEDT ERROR:  insert or update on table "address_alias_lookup" violates foreign key constraint "address_alias_lookup_fk2"
2018-03-22 14:59:18 AEDT DETAIL:  Key (principal_pid)=(GAVIC425572821) is not present in table "address_principals".
2018-03-22 14:59:18 AEDT STATEMENT:  ALTER TABLE ONLY gnaf.address_alias_lookup ADD CONSTRAINT address_alias_lookup_fk2 FOREIGN KEY (principal_pid) REFERENCES gnaf.address_principals(gnaf_pid);
2018-03-22 14:59:20 AEDT ERROR:  insert or update on table "streets" violates foreign key constraint "streets_fk1"
2018-03-22 14:59:20 AEDT DETAIL:  Key (locality_pid)=(SA999999) is not present in table "localities".
2018-03-22 14:59:20 AEDT STATEMENT:  ALTER TABLE ONLY gnaf.streets ADD CONSTRAINT streets_fk1 FOREIGN KEY (locality_pid) REFERENCES gnaf.localities(locality_pid);

and these during part 5:

2018-03-22 15:10:28 AEDT STATEMENT:  INSERT INTO gnaf.qa SELECT 'address_alias_lookup', SUM(AUS), SUM(ACT), SUM(NSW), SUM(NT), SUM(OT), SUM(QLD), SUM(SA), SUM(TAS), SUM(VIC), SUM(WA) FROM (SELECT 1 AS AUS,CASE WHEN state = 'ACT' THEN 1 ELSE 0 END AS ACT,CASE WHEN state = 'NSW' THEN 1 ELSE 0 END AS NSW,CASE WHEN state = 'NT' THEN 1 ELSE 0 END AS NT,CASE WHEN state = 'OT' THEN 1 ELSE 0 END AS OT,CASE WHEN state = 'QLD' THEN 1 ELSE 0 END AS QLD,CASE WHEN state = 'SA' THEN 1 ELSE 0 END AS SA,CASE WHEN state = 'TAS' THEN 1 ELSE 0 END AS TAS,CASE WHEN state = 'VIC' THEN 1 ELSE 0 END AS VIC,CASE WHEN state = 'WA' THEN 1 ELSE 0 END AS WA FROM gnaf.address_alias_lookup) AS sqt
2018-03-22 15:11:35 AEDT ERROR:  column "state" does not exist at character 183
2018-03-22 15:11:35 AEDT STATEMENT:  INSERT INTO gnaf.qa SELECT 'address_secondary_lookup', SUM(AUS), SUM(ACT), SUM(NSW), SUM(NT), SUM(OT), SUM(QLD), SUM(SA), SUM(TAS), SUM(VIC), SUM(WA) FROM (SELECT 1 AS AUS,CASE WHEN state = 'ACT' THEN 1 ELSE 0 END AS ACT,CASE WHEN state = 'NSW' THEN 1 ELSE 0 END AS NSW,CASE WHEN state = 'NT' THEN 1 ELSE 0 END AS NT,CASE WHEN state = 'OT' THEN 1 ELSE 0 END AS OT,CASE WHEN state = 'QLD' THEN 1 ELSE 0 END AS QLD,CASE WHEN state = 'SA' THEN 1 ELSE 0 END AS SA,CASE WHEN state = 'TAS' THEN 1 ELSE 0 END AS TAS,CASE WHEN state = 'VIC' THEN 1 ELSE 0 END AS VIC,CASE WHEN state = 'WA' THEN 1 ELSE 0 END AS WA FROM gnaf.address_secondary_lookup) AS sqt
2018-03-22 15:13:48 AEDT ERROR:  column "state" does not exist at character 184
2018-03-22 15:13:48 AEDT STATEMENT:  INSERT INTO gnaf.qa SELECT 'locality_neighbour_lookup', SUM(AUS), SUM(ACT), SUM(NSW), SUM(NT), SUM(OT), SUM(QLD), SUM(SA), SUM(TAS), SUM(VIC), SUM(WA) FROM (SELECT 1 AS AUS,CASE WHEN state = 'ACT' THEN 1 ELSE 0 END AS ACT,CASE WHEN state = 'NSW' THEN 1 ELSE 0 END AS NSW,CASE WHEN state = 'NT' THEN 1 ELSE 0 END AS NT,CASE WHEN state = 'OT' THEN 1 ELSE 0 END AS OT,CASE WHEN state = 'QLD' THEN 1 ELSE 0 END AS QLD,CASE WHEN state = 'SA' THEN 1 ELSE 0 END AS SA,CASE WHEN state = 'TAS' THEN 1 ELSE 0 END AS TAS,CASE WHEN state = 'VIC' THEN 1 ELSE 0 END AS VIC,CASE WHEN state = 'WA' THEN 1 ELSE 0 END AS WA FROM gnaf.locality_neighbour_lookup) AS sqt
2018-03-22 15:13:48 AEDT ERROR:  column "state" does not exist at character 173
2018-03-22 15:13:48 AEDT STATEMENT:  INSERT INTO gnaf.qa SELECT 'street_aliases', SUM(AUS), SUM(ACT), SUM(NSW), SUM(NT), SUM(OT), SUM(QLD), SUM(SA), SUM(TAS), SUM(VIC), SUM(WA) FROM (SELECT 1 AS AUS,CASE WHEN state = 'ACT' THEN 1 ELSE 0 END AS ACT,CASE WHEN state = 'NSW' THEN 1 ELSE 0 END AS NSW,CASE WHEN state = 'NT' THEN 1 ELSE 0 END AS NT,CASE WHEN state = 'OT' THEN 1 ELSE 0 END AS OT,CASE WHEN state = 'QLD' THEN 1 ELSE 0 END AS QLD,CASE WHEN state = 'SA' THEN 1 ELSE 0 END AS SA,CASE WHEN state = 'TAS' THEN 1 ELSE 0 END AS TAS,CASE WHEN state = 'VIC' THEN 1 ELSE 0 END AS VIC,CASE WHEN state = 'WA' THEN 1 ELSE 0 END AS WA FROM gnaf.street_aliases) AS sqt
minus34 commented 6 years ago

As this is an ambiguous Postgres issue that's not 100% repeatable - I've moved all CREATE SCHEMA statements to the start. That should avoid this problem.

Please retest.

CMCDragonkai commented 6 years ago

I just ran it now with master branch of the gnaf-loader with Postgres 9.6.6 and psycopg2 2.7.1 and python 3.6, with Feb 2018 GNAF and Admin Boundaries.

SELECT postgis_full_version();

POSTGIS="2.4.3 r16312" PGSQL="96" GEOS="3.6.1-CAPI-1.10. 1 r0" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.2.3,  released 2017/11/20" LIBXML="2.9.7" LIBJSON="0.12.1" TOP OLOGY RASTER

I get this error:

python ./gnaf-loader/load-gnaf.py \
$>   --gnaf-tables-path "$(pwd)/data/FEB18_GNAF_PipeSeparatedValue_20180219141901/G-NAF/G-NAF FEBRUARY 2018" \
$>   --admin-bdys-path "$(pwd)/data/FEB18_AdminBounds_ESRIShapefileorDBFfile_20180219141148" \
$>   --psma-version '201802' \
$>   --raw-gnaf-schema 'raw_gnaf' \
$>   --raw-admin-schema 'raw_admin_bdys' \
$>   --gnaf-schema 'gnaf' \
$>   --admin-schema 'admin_bdys' \
$>   --raw-unlogged
root        : INFO     
root        : INFO     Start gnaf-loader
root        : INFO      - running Python 3.6.4 with Psycopg2 2.7.1
root        : INFO      - on Linux #1-NixOS SMP Wed Feb 28 09:21:39 UTC 2018
root        : INFO      - using Postgres 9.6.6 on x86_64-pc-linux-gnu and PostGIS 2.4.3 (with GEOS 3.6.1-CAPI-1.10.1)
root        : INFO     
root        : INFO     Part 1 of 6 : Create schemas : 2018-04-04 15:10:04.233197
root        : INFO     Part 1 of 6 : Schemas created! : 0:00:00.000637
root        : INFO     
root        : INFO     Part 2 of 6 : Start raw GNAF load : 2018-04-04 15:10:04.233968
root        : INFO      - Step 1 of 7 : tables dropped : 0:00:05.758780
root        : INFO      - Step 2 of 7 : database NOT vacuumed
root        : INFO      - Step 3 of 7 : UNLOGGED tables created : 0:00:00.048814
root        : INFO              - Loading state ACT
root        : INFO              - Loading state NSW
root        : INFO              - Loading state NT
root        : INFO              - Loading state OT
root        : INFO              - Loading state QLD
root        : INFO              - Loading state SA
root        : INFO              - Loading state TAS
root        : INFO              - Loading state VIC
root        : INFO              - Loading state WA
root        : INFO      - Step 4 of 7 : tables populated : 0:00:26.095410
root        : INFO      - Step 5 of 7 : indexes created: 0:01:11.258091
root        : INFO      - Step 6 of 7 : primary & foreign keys NOT created
root        : INFO      - Step 7 of 7 : tables analysed : 0:00:08.924889
root        : INFO     Part 2 of 6 : Raw GNAF loaded! : 0:01:52.087635
root        : INFO     
root        : INFO     Part 3 of 6 : Start raw admin boundary load : 2018-04-04 15:11:56.321741
root        : INFO      - Step 1 of 3 : raw admin boundaries loaded : 0:03:24.534184
root        : INFO      - Step 2 of 3 : admin boundaries prepped : 0:00:14.260149
root        : INFO      - Step 3 of 3 : admin boundaries for analysis created : 0:00:07.940194
root        : INFO     Part 3 of 6 : Raw admin boundaries loaded! : 0:03:46.735956
root        : INFO     
root        : INFO     Part 4 of 6 : Start create reference tables : 2018-04-04 15:15:43.057856
Traceback (most recent call last):
  File "./gnaf-loader/load-gnaf.py", line 951, in <module>
    if main():
  File "./gnaf-loader/load-gnaf.py", line 118, in main
    create_reference_tables(pg_cur, settings)
  File "./gnaf-loader/load-gnaf.py", line 586, in create_reference_tables
    pg_cur.execute("SET search_path = public, pg_catalog")
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block

Checking the postgres logs, it's pretty long. But it has this at the very beginning:

ERROR:  canceling autovacuum task
CONTEXT:  automatic analyze of table "postgres.raw_gnaf.address_site_geocode"
ERROR:  canceling autovacuum task
CONTEXT:  automatic analyze of table "postgres.raw_gnaf.address_detail"
ERROR:  canceling autovacuum task
CONTEXT:  automatic analyze of table "postgres.raw_gnaf.address_site"
LOG:  checkpoints are occurring too frequently (22 seconds apart)
HINT:  Consider increasing the configuration parameter "max_wal_size".
ERROR:  canceling autovacuum task
CONTEXT:  automatic analyze of table "postgres.raw_admin_bdys.aus_state_electoral_polygon"
ERROR:  canceling autovacuum task
CONTEXT:  automatic analyze of table "postgres.raw_admin_bdys.aus_sosr_2011_polygon"
ERROR:  column "tas_state_" of relation "aus_state_electoral_polygon" does not exist at character 182
STATEMENT: ...

And then 2483612401 lines of SQL statements.

Maybe this is important?

ERROR:  column "tas_state_" of relation "aus_state_electoral_polygon" does not exist at character 182
CMCDragonkai commented 6 years ago

@skellert0 A noob question, but how do you get August 2017 dataset for Admin Boundaries? The data.gov.au doesn't seem to have links to older datasets?

skellert0 commented 6 years ago

@CMCDragonkai I'm a noob at this too.... I got the August 2017 dataset back in August 2017 as a monolithic download. You could contact https://data.gov.au/user/annbeaumaris, who is mentioned in the activity stream for that data set???

CMCDragonkai commented 6 years ago

That sucks that data.gov.au doesn't archive older versions.

andrewharvey commented 6 years ago

That sucks that data.gov.au doesn't archive older versions.

If you don't need the monolithic download, the individual files are usually still there so you could try downloading them all and putting in the expected structure for gnaf-loader.

PS. I wrote https://github.com/andrewharvey/psma-admin-bdys-data to help automate downloading the PSMA Admin Boundaries. (compared to https://github.com/iag-geo/psma-admin-bdys it doesn't do as much post processing)

minus34 commented 6 years ago

how do you get August 2017 dataset for Admin Boundaries?

You can download the postgres dump files here for GNAF and/or Admin Bdys: http://minus34.com/opendata/psma-201708/gnaf-201708.dmp http://minus34.com/opendata/psma-201708/admin-bdys-201708.dmp

The dump files are available for versions going back to Feb 2017. Use the same URLs as above with a different YYYYMM.

CMCDragonkai commented 6 years ago

@minus34 Thanks, I'll try that. However I do want to be able to update to the latest dataset later. What do you think is the problem with importing Feb 2018 dataset atm?

I just tried the docker instructions using Feb 2018 dataset. Same exact error as the one that I just posted.

minus34 commented 6 years ago

Re: the data - the 201802 dump files are up on the web site if you'd like to bypass this issue.

As for the cause - it looks like an obscure, hard to repeat, Postgres issue. I've not been able to replicate it on PG 9.6 and 10.3 on Mac and Windows; although I had an interesting issue where ~50 raw admin bdy tables vanished from the schema (related to your autovacuum issue????).

The only thing I can recommend is to grab the latest code, turn autovacuum off and try again...

cndrla commented 6 years ago

I've just tried with the latest code, autovacuum off, and still getting the error:

root        : INFO
root        : INFO     Start gnaf-loader
root        : INFO      - running Python 2.7.11 with Psycopg2 2.6.2
root        : INFO      - on Windows 6.3.9600
root        : INFO      - using Postgres 9.6.3 and PostGIS 2.3.2 (with GEOS 3.6.1-CAPI-1.10.1)
root        : INFO
root        : INFO     Part 1 of 6 : Create schemas : 2018-04-06 09:37:51.446000
root        : INFO     Part 1 of 6 : Schemas created! : 0:00:00
root        : INFO
root        : INFO     Part 2 of 6 : Start raw GNAF load : 2018-04-06 09:37:51.446000
root        : INFO      - Step 1 of 7 : tables dropped : 0:00:01.218000
root        : INFO      - Step 2 of 7 : database NOT vacuumed
root        : INFO      - Step 3 of 7 : tables created : 0:00:00.141000
root        : INFO              - Loading state ACT
root        : INFO              - Loading state NSW
root        : INFO              - Loading state NT
root        : INFO              - Loading state OT
root        : INFO              - Loading state QLD
root        : INFO              - Loading state SA
root        : INFO              - Loading state TAS
root        : INFO              - Loading state VIC
root        : INFO              - Loading state WA
root        : INFO      - Step 4 of 7 : tables populated : 0:02:41.396000
root        : INFO      - Step 5 of 7 : indexes created: 0:02:22.413000
root        : INFO      - Step 6 of 7 : primary & foreign keys created : 0:00:00.328000
root        : INFO      - Step 7 of 7 : tables analysed : 0:00:17.813000
root        : INFO     Part 2 of 6 : Raw GNAF loaded! : 0:05:23.309000
root        : INFO
root        : INFO     Part 3 of 6 : Start raw admin boundary load : 2018-04-06 09:43:14.755000
root        : INFO      - Step 1 of 3 : raw admin boundaries loaded : 0:10:58.584000
root        : INFO      - Step 2 of 3 : admin boundaries prepped : 0:00:40.125000
root        : INFO      - Step 3 of 3 : admin boundaries for analysis created : 0:00:25.609000
root        : INFO     Part 3 of 6 : Raw admin boundaries loaded! : 0:12:04.428000
root        : INFO
root        : INFO     Part 4 of 6 : Start create reference tables : 2018-04-06 09:55:19.214000
Traceback (most recent call last):
  File "load-gnaf.py", line 951, in <module>
    if main():
  File "load-gnaf.py", line 118, in main
    create_reference_tables(pg_cur, settings)
  File "load-gnaf.py", line 586, in create_reference_tables
    pg_cur.execute("SET search_path = public, pg_catalog")
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block
CMCDragonkai commented 6 years ago

@minus34 Maybe try on Linux. I tried both option 1 on NixOS with Postgres 9.6.6 and option 2 using your docker container and using docker compose. Both result in the same error. Supposing docker is meant to create reproducible environments, then why would it fail for importing Feb 2018?

unverified-contact commented 6 years ago

Ran today, it also dies when I try it.

The script arrives at a fatal error at the point shown below but there are SQL failures reported well before that point so I've included the entire output for reference here .

https://pastebin.com/Cm1FEP4K https://pastebin.com/raw/Cm1FEP4K

root        : INFO      - Step 3 of 3 : admin boundaries for analysis created : 0:00:00.344880
root        : INFO     Part 3 of 6 : Raw admin boundaries loaded! : 0:00:02.635921
root        : INFO     
root        : INFO     Part 4 of 6 : Start create reference tables : 2018-05-15 15:30:34.317254
root        : INFO      - Step  1 of 14 : create reference tables : 0:00:00.055335
root        : INFO      - Step  2 of 14 : localities populated : 0:00:00.491115
root        : INFO      - Step  3 of 14 : locality aliases populated : 0:00:00.709998
root        : INFO      - Step  4 of 14 : locality neighbours populated : 0:00:00.183920
root        : INFO      - Step  5 of 14 : streets populated : 0:00:17.675546
root        : INFO      - Step  6 of 14 : street aliases populated : 0:00:00.721336
root        : INFO      - Step  7 of 14 : addresses populated : 0:20:23.390106
root        : INFO      - Step  8 of 14 : principal alias lookup populated : 0:01:30.150836
root        : INFO      - Step  9 of 14 : primary secondary lookup populated : 0:00:35.206367
Traceback (most recent call last):
  File "./gnaf-loader/load-gnaf.py", line 951, in <module>
    if main():
  File "./gnaf-loader/load-gnaf.py", line 118, in main
    create_reference_tables(pg_cur, settings)
  File "./gnaf-loader/load-gnaf.py", line 640, in create_reference_tables
    pg_cur.execute(psma.open_sql_file("03-10-reference-split-melbourne.sql", settings))
psycopg2.ProgrammingError: relation "admin_bdys.locality_bdys" does not exist
LINE 14:   FROM admin_bdys.locality_bdys AS bdy
duelran commented 6 years ago

@CMCDragonkai I had the same issue as you. The issue is actually with the source data. The TAS_STATE_ELECTORAL_POLYGON_shp.shp file has a bunch of extra attributes in the attribute table that the script is trying to insert into a table without those column names. I thought about changing the script, but decided to take the easier path and just edited the attribute table in QGIS to delete these columns.

CMCDragonkai commented 6 years ago

@duelran That's it? Just change the source data and remove those columns. What are those columns that you removed?

minus34 commented 6 years ago

Thanks to @duelran for finding the cause of all of this - some bad data in the last PSMA Admin Bdys release - This multi-faceted issue has been fixed.

Here's what was causing it:

  1. When certain queries fail in Postgres, they can leave the database in a state where the next query will generate an error even though there's nothing wrong with it: see this StackOverflow issue.
  2. The TAS_STATE_ELECTORAL_POLYGON_shp extra fields caused all subsequent admin bdy imports to fail PLUS the next query after the admin bdy load. This was due to the same Postgres connection being used and the issue above. The Admin Bdy load now uses a seperate PG connection per file load.
  3. Why was this hard to debug? I was testing using the Admin Bdys downloaded from the PSMA website (as my company is a PSMA customer) - that version of the data does not have the TAS polygon problems. It should be identical to the one on data.gov.au, but wasn't. Hence I couldn't replicate previously.

Please grab the latest code and test. You should get this error only (due to the data):

root : WARNING Importing TAS_STATE_ELECTORAL_POLYGON_shp.shp - Couldn't run Shapefile SQL shp2pgsql result was: column "tasstate" of relation "aus_state_electoral_polygon" does not exist LINE 4: ...n" ("se_ply_pid","dt_create","dt_retire","se_pid","tas_state...

Last thing - can someone provide feedback to data.gov.au about the data error. I'm going to bed :-)

CMCDragonkai commented 6 years ago

@minus34 Awesome. Does that mean the code now works for the data that's in data.gov.au? Also I'm a bit confused, are you saying that the data.gov.au has a dataset that has extra columns or that it has missing columns?

minus34 commented 6 years ago

The code will now continue to run regardless of Admin Bdy data errors, but it won't fix the errors.

data.gov.au has the invalid version with the extra columns. To avoid this, I've posted the correct version here:

andrewharvey commented 6 years ago

Is it correct to assume that since the GNAF doesn't care about State Electoral boundaries, with current master code, the issue with the data won't affect the gnaf table(s)?

minus34 commented 6 years ago

Yes, but only if you don't want State Electoral bdys in the GNAF Boundary Tag tables.

duelran commented 6 years ago

@minus34 thanks very much for the permanent fix. I've sent data.gov.au feedback about the issue.

@CMCDragonkai while things have moved on a lot since you asked the question, the specific attributes that needed to be deleted from the data.gov.au version of the TAS_STATE_ELECTORAL_POLYGON_shp.shp shapefile were:

  1. TASSTATE
  2. TAS_STAT_1
  3. TAS_STAT_2
  4. TAS_STAT_3
  5. TAS_STAT_4
  6. TAS_STAT_5
  7. TAS_STAT_6
  8. TAS_STAT_7

With the script version prior to the fix the rest of the load worked successfully with no warnings raised.