michaelsogos / pg-diff

PostgreSQL schema and data comparing tool
MIT License
167 stars 17 forks source link

Data compare doesn't detect updates of NULL values #45

Open olivierdalang opened 2 years ago

olivierdalang commented 2 years ago

Hi !

I'm comparing two schema, where the newer schema has new columns with data.

The data compare feature correctly creates the INSERT statement (populating the new columns with the values), but does not include the new columns in UPDATE statements.

Example :

-- SCHEMA A
CREATE TABLE public.test (
    id serial NOT NULL,
    "name" varchar NULL,
    CONSTRAINT test_pk PRIMARY KEY (id)
);
INSERT INTO test VALUES (1, 'john');
-- SCHEMA B
CREATE TABLE public.test (
    id serial NOT NULL,
    "name" varchar NULL,
    "surname" varchar NULL,
    CONSTRAINT test_pk PRIMARY KEY (id)
);
INSERT INTO test VALUES (1, 'johnny', 'wayne');
INSERT INTO test VALUES (2, 'alfred', 'hitchcock');

Using config

{
  "test": {
    "targetClient": {
      ...
      "database": "testa",
      ...
    },
    "sourceClient": {
      ...
      "database": "testb",
      ...
    },
    "compareOptions": {
      ...
      "dataCompare": {
        "enable": true,
        "tables": [
          {"tableSchema":"public", "tableName":"test", "tableKeyFields":["id"]}
        ]
      }
    }
  }
}

I get the script

-- ...
ALTER TABLE IF EXISTS "public"."test" ADD COLUMN IF NOT EXISTS "surname" varchar NULL  ;
-- ...
INSERT INTO "public"."test" ("id", "name", "surname")  VALUES (2, 'alfred', 'hitchcock');
UPDATE "public"."test" SET "name" = 'johnny' WHERE "id" = 1;
-- ...

See that "john" is correctly updated to "johnny", but the "wayne" is not populated at all, while "hitchcock" is correctly populated for the newly inserted value.

olivierdalang commented 2 years ago

Hmm actually it doesn't seem linked to newly added columns, but just to NULL values as if I run my script in two steps (first update the schema, then create the diff) I run in the same issue...

Example :

-- SCHEMA A
CREATE TABLE public.test (
    id serial NOT NULL,
    "name" varchar NULL,
    CONSTRAINT test_pk PRIMARY KEY (id)
);
INSERT INTO test VALUES (1, NULL);
-- SCHEMA B
CREATE TABLE public.test (
    id serial NOT NULL,
    "name" varchar NULL,
    "surname" varchar NULL,
    CONSTRAINT test_pk PRIMARY KEY (id)
);
INSERT INTO test VALUES (1, 'johnny');
INSERT INTO test VALUES (2, 'alfred');

I get the script

-- ...
INSERT INTO "public"."test" ("id", "name")  VALUES (2, 'alfred');
-- ...

missing the NULL -> ' johnny' difference.

I guess there must be a comparison issue in the code, something like WHERE olddata <> newdata, which will not correctly handle null values (to be replaced with WHERE olddata is distinct from newdata

olivierdalang commented 2 years ago

Think I found the bug here : https://github.com/michaelsogos/pg-diff-api/pull/9