qgis / QGIS

QGIS is a free, open source, cross platform (lin/win/mac) geographical information system (GIS)
https://qgis.org
GNU General Public License v2.0
10.08k stars 2.93k forks source link

Proposal for a new Versioning functionality in DB Manager #56201

Open fryktoria opened 5 months ago

fryktoria commented 5 months ago

Feature description

DB Manager offers a Versioning tool that helps to keep track of changes in a table. The current functionality uses an ingenious method where the original table is modified in order to function as the history table and an updatable view is created, through which the current data are exposed. It is supposed that all operations through QGIS are performed on this view.

Unfortunately, this method does not work well when the original table has additional functionality attached to it, such as sequences, triggers, constraints etc. This functionality cannot be exposed 100% when the view is handled by QGIS, causing additional problems. Solutions such as the one implemented to fix issue 25888, cannot be considered as generic solutions and in my opinion, are the cause of irrational operation. For example, the fix to issue 25888 allowed the QGIS user to manipulate id_hist which is the new primary key of the original table and is associated with a sequence, creating a potential problem both in having duplicate keys and also in the operation of the sequence. The versioning functionality should be invisible to the end user.

This proposal considers a different way for versioning, which works as follows:

  1. We leave the original table as is. The QGIS user operates on the original table, therefore all constraints, sequences etc are maintained.

  2. We create a history table with the exact same structure of the original, adding id_hist as PK, time_start, time_end and user_role.

The examples in the Additional context section present the SQL statements to create the above functionality in PostGIS for a table "logged_lines".

A drawback of this solution is of course the requirement of more storage capacity than the current method. But I guess that in our modern times, storage is not a problem.

Additional context

-- The definition of the example tables

CREATE TABLE public."logged_lines" (
    "id" serial4 NOT NULL,
    "name" varchar NULL,
    "geom" public.geometry(linestring, 2100) NULL,
    CONSTRAINT logged_lines_pkey PRIMARY KEY (id)
);

-- The SQL statements to create the new functionality

CREATE TABLE "public"."logged_lines_history" (
  "id" integer,
  "name" varchar, 
  "geom" public.geometry(linestring, 2100) NULL,
  "id_hist" serial4 NOT NULL, 
  "time_start" timestamp default '-infinity', 
  "time_end" timestamp NULL, 
  "user_role" varchar NULL
);

ALTER TABLE "public"."logged_lines_history"  ADD PRIMARY KEY ("id_hist");

CREATE OR REPLACE FUNCTION "public"."logged_lines_at_time"(timestamp)
RETURNS SETOF "public"."logged_lines_history" AS
$$
SELECT * FROM "public"."logged_lines_history" WHERE
  ( SELECT CASE WHEN "time_end" IS NULL THEN ("time_start" <= $1) ELSE ("time_start" <= $1 AND "time_end" > $1) END );
$$
LANGUAGE 'sql';

CREATE OR REPLACE FUNCTION "public"."logged_lines_insert"()
RETURNS trigger AS
$$
BEGIN

  INSERT INTO "public"."logged_lines_history" ("id","name","geom", "time_start", "time_end", "user_role") VALUES (NEW."id",NEW."name", NEW."geom", current_timestamp, NULL, current_user);

  RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION "public"."logged_lines_update"()
RETURNS TRIGGER AS
$$
BEGIN   

    UPDATE "public"."logged_lines_history" SET "time_end" = current_timestamp WHERE "id" = NEW."id" AND "time_end" IS NULL;
    INSERT INTO "public"."logged_lines_history" ("id","name", "geom", "time_start", "time_end", "user_role") VALUES (NEW."id",NEW."name", NEW."geom", current_timestamp, NULL, current_user);   

  RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION "public"."logged_lines_delete"()
RETURNS trigger AS
$$
BEGIN

  UPDATE "public"."logged_lines_history" SET "time_end" = current_timestamp WHERE "id" = OLD."id" AND "time_end" IS NULL;

  RETURN OLD;
END;
$$
LANGUAGE 'plpgsql';

CREATE OR REPLACE TRIGGER "logged_lines_insert" BEFORE INSERT ON "public"."logged_lines"
FOR EACH ROW EXECUTE PROCEDURE "public"."logged_lines_insert"();

CREATE OR REPLACE TRIGGER "logged_lines_update" BEFORE UPDATE ON "public"."logged_lines"
FOR EACH ROW EXECUTE PROCEDURE "public"."logged_lines_update"();

CREATE OR REPLACE TRIGGER "logged_lines_delete" BEFORE DELETE ON "public"."logged_lines"
FOR EACH ROW EXECUTE PROCEDURE "public"."logged_lines_delete"();
Brent-Edwards commented 5 months ago

For reference purposes, here a couple of postgres implementations that may be of interest:

https://github.com/pgaudit/pgaudit https://wiki.postgresql.org/wiki/Audit_trigger_91plus

fryktoria commented 5 months ago

For reference purposes, here a couple of postgres implementations that may be of interest:

https://github.com/pgaudit/pgaudit https://wiki.postgresql.org/wiki/Audit_trigger_91plus

Thanks for the contribution @Brent-Edwards. There are several implementations for most databases. I have recently found Kart project which may be suited for large projects and it claims to be specialized in the particulars of geospatial data. I have not tried it yet.

Yet, the purpose of my proposal is not to create from scratch or to integrate a large code base into QGIS. My intention is just to make simple modifications to the existing Versioning functionality in order to operate "as expected", without frustrating the users with errors from failure to commit, or from creating out of sequence primary keys. I also believe that any decent database administrator can setup a versioning mechanism without resorting to QGIS. But the target of such functionality is engineers with a limited knowledge of databases who would appreciate a quick and easy implementation of versioning, without employing a database expert.

Taking the opportunity to comment further, please note that I questioned myself if I was about to report the issue as Bug or as Feature request. Since the current code operates properly under specific constraints of usage, i.e. the user is diligent to submit non-duplicated keys etc., it does not qualify as a Bug. Also, there may be people out there whose operations are relying on the existing implementation which remains the same for years, since the early versions of QGIS. Those people they will face problems when they upgrade because the proposal is not compatible to the existing implementation. It is up to the QGIS maintainers to assess and decide the way forward.

autra commented 5 months ago

My intention is just to make simple modifications to the existing Versioning

I'm totally sold to the audit table mechanism, as opposed to the inplace versioning currently implemented in the db manager, but such a migration is unfortunately not a simple modifications...

Actually, as a user (and one-time contributor to this very feature of inplace versioning), I'd vote to remove it completely 😆 and instead redirect to plugins. The rationale for this:

(Btw, there is a nice visualisation plugin to use on top of pgaudit if you are interested: pg-history-viewer )

saberraz commented 5 months ago

Versioning geospatial data is not very straight forward. As mentioned in the earlier posts, there are already some third party services.

There is a geodiff library (https://github.com/MerginMaps/geodiff) which allows you to version your GeoPackage (or any SQLite database) and PostGIS database. We use this library as the basis of Mergin Maps. You can give it a try and see how it works: https://merginmaps.com/docs/dev/mergince/

fryktoria commented 5 months ago

I agree with all regarding the different forms that versioning may take. Types of versioning that I have encountered:

  1. Main focus on who did what and when.
  2. Project focused for the production of a deliverable (kart is a perfect example)
  3. Operations focused (e.g. to check the current status of a fiber optic network, see changes made between time1 and time 2 but still having the complete GIS functionality for all views, definitely multi-user etc.). After digging into kart, it seems not suitable for this.

My personal interest is on the "Operations focused" which is more close to the existing functionalty. This is the reason that I have started using it as a starting point several years ago, making lots of adjustments in the database as time progressed. Perhaps @autra is right that it should be removed from the core functionality, allowing plugins to fit into the particular needs of every user. Yet, if the functionality remains, it should work "properly", at least in most situations. From my point of view, I consider it very easy to modify the existing code in order to run the SQL statements I proposed in the Additional context section and I would be willing to do it.

My question to you, as Contributors, is if a pull request that would change the functionality in the way that I described in the post, will have any possibility to be accepted, or if the migration issues referred to by @autra or any other issues you may be aware of will not allow it.