scolab-dot-eu / pygdaltools

GNU Affero General Public License v3.0
16 stars 4 forks source link

Using OGR_TRUNCATE in Ogr2ogr.config_options #5

Open Dunc1995 opened 3 years ago

Dunc1995 commented 3 years ago

Description

When extending the Ogr2ogr class, I attempted to set some options as follows:

        self.config_options = {
            'OGR_TRUNCATE': 'YES',
            'PG_USE_COPY': 'YES'
        }
        self.layer_creation_options = {
            'GEOMETRY_NAME': 'geom'
        }
        self.set_output_mode(data_source_mode=self.MODE_DS_CREATE_OR_UPDATE, layer_mode=self.MODE_LAYER_OVERWRITE)

This then executes the following block of code in ogr2ogrcmd.py (line 262):

            if self.out_file_type ==  "PostgreSQL" and config_options.get("OGR_TRUNCATE") != "NO":
                # prefer truncate for PostgresSQL driver
                args.extend(['-append'])
                config_options["OGR_TRUNCATE"] = "YES"

By setting OGR_TRUNCATE != 'NO', this adds the -append flag, which then prevents me from adding layer creation options. This isn't how ogr2ogr normally behaves in the CLI, so I was wondering if this behaviour is intended?

dispiste commented 3 years ago

When you use --config OGR_TRUNCATE YES in the command line, this implies an -append and prevents using layer creation options. It doesn't matter whether you explicitly set the -append option or not, ogr2ogr will complain if you provide any layer creation option.

On the other hand, you can use -overwrite and use layer creation options, but then the OGR_TRUNCATE YES will be ignored, the table will be dropped and then created (and you will lost any existing additional indexes, foreign keys, triggers, etc).

Ogr2ogr documentation for OGR_TRUNCATE option states:

Typical use case: “ogr2ogr -append PG:dbname=foo abc.shp –config OGR_TRUNCATE YES”.

which probably illustrates that OGR_TRUNCATE must be used together with -append, although it is not conclusive indeed.

Anyway, I have just tested the ogr2ogr CLI using GDAL 2.2.3 and the behaviour is consistent with the one explained here. Which version are you using? What CLI tests have you performed?

Dunc1995 commented 3 years ago

TLDR - I think there are some redundant settings in my ogr2ogr command!

What you've explained makes sense, although I think in my case the -overwrite flag implicitly overrides OGR_TRUNCATE YES when using ogr2ogr directly. I'm using GDAL 3.0.4 at the moment.

As an example, the string below is currently what gets appended to my ogr2ogr command; if I try to replicate this string exactly using pygdaltools, the geometry column remains unchanged, whereas the GEOMETERY_NAME=geom option changes the name of my geometry column when running the command directly.


' "{file_name}" -nln "<schema>.<table>" -a_srs "EPSG:27700" -nlt MULTILINESTRING -lco GEOMETRY_NAME=geom -sql "<some sql>" -progress --config PG_USE_COPY YES --CONFIG OGR_TRUNCATE YES -overwrite'
dispiste commented 3 years ago

You can produce similar params using the following pygdaltools code:

import gdaltools

# from postgis to postgis
ogr = gdaltools.ogr2ogr()
conn = gdaltools.PgConnectionString(host="localhost", port=5432, dbname="mydb", schema="myschema", user="myuser", password="mypass")
ogr.set_input(conn, table_name="roads", srs="EPSG:4326")
ogr.set_output(conn, table_name="roads2")
ogr.set_output_mode(layer_mode=ogr.MODE_LAYER_OVERWRITE, data_source_mode=ogr.MODE_DS_CREATE_OR_UPDATE)
ogr.config_options = {"PG_USE_COPY": "YES", "OGR_TRUNCATE": "NO"}
ogr.layer_creation_options = {"GEOMETRY_NAME": "geom"}
ogr.set_sql("some sql")
ogr.geom_type = "MULTILINESTRING"
ogr.execute()

Note that OGR_TRUNCATE=NO must be used to get an actual -overwrite command, because pygdaltools assumes -append and OGR_TRUNCATE=YES when using layer_mode=ogr.MODE_LAYER_OVERWRITE and postgis output driver. Maybe this default is a bit too much opinionated (since it prevents using layer creation options), but I decided it is a reasonable default for an overwrite operation.

You can use the following code to see the produced command, at any time before or after execute():

# command with hidden passwords
print(" ".join(ogr.safe_args))
# real command
print(" ".join(ogr.args))