nixys / nxs-data-anonymizer

A tool for anonymizing PostgreSQL and MySQL databases' dump
Apache License 2.0
233 stars 10 forks source link

Incorrect sql for nullable columns #7

Closed akademic closed 1 year ago

akademic commented 1 year ago

If I have dump

-- Create the table
CREATE TABLE `my_table` (
      `id` INT PRIMARY KEY,
      `first_name` VARCHAR(50),
      `last_name` VARCHAR(50) DEFAULT NULL,
);

-- Insert two records
INSERT INTO `my_table` (id, first_name, last_name) VALUES (1,'John',NULL);

and transform it with config

filters:
  my_table:
    columns:
      last_name:
        value: "{{- randAlphaNum 20 -}}"

I will get

-- Insert two records
INSERT INTO `my_table` (id, first_name, last_name) VALUES (1, 'John',abcdfefgh);

It will be great, that anonymizer will handle such things

akademic commented 1 year ago

Added possible solution: #10

borisershov commented 1 year ago

It seems we are need to add types for columns...

akademic commented 1 year ago

Yep, and it will be useful for security enforcement also

akademic commented 1 year ago

And... we need tests to avoid regressions

sqlExample + filterConfig + expectedResult

borisershov commented 1 year ago

It seems we are need to add types for columns...

I think it will be released in a next few days

borisershov commented 1 year ago

Hi! Branch feat/mysql-null-values must fix your issue (it will be merged with main after testing).

In this changes you able to use functions null and isNull to handle nullable values correctly. For example...

You have a dump:

-- Create the table
CREATE TABLE `my_table` (
      `id` INT PRIMARY KEY,
      `first_name` VARCHAR(50),
      `last_name` VARCHAR(50) DEFAULT NULL,
      `full_name` VARCHAR(100) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) STORED
);

-- Insert two records
INSERT INTO `my_table` (id, first_name, last_name) VALUES (1,'John',NULL),(2,'Bob','Smith');

And you need to do: if last_name is NULL - set to random string, otherwise set field to NULL. You may use following config to this issue:

filters:
  my_table:
    columns:
      last_name:
        value: "{{ if (isNull .Values.last_name) }}{{- randAlphaNum 20 -}}{{ else }}{{ null }}{{ end }}"

Dump after transformation:

-- Create the table
CREATE TABLE `my_table` (
      `id` INT PRIMARY KEY,
      `first_name` VARCHAR(50),
      `last_name` VARCHAR(50) DEFAULT NULL,
      `full_name` VARCHAR(100) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) STORED
);

-- Insert two records
INSERT INTO `my_table` (id, first_name, last_name) VALUES (1,'John','CICPKt3YZDb99MCcERaV'),(2,'Bob',NULL);

Please check changes on your tests

akademic commented 1 year ago

This PR introduced bug for JSON fields:

-- Create the table
CREATE TABLE `my_table` (
      `id` INT PRIMARY KEY,
      `json` json NOT NULL
);

INSERT INTO `my_table` VALUES (id,'{\"some\":\"data\"}');

becomes (json value has lost quotes)

-- Create the table
CREATE TABLE `my_table` (
  `id` INT PRIMARY KEY,
  `json` json NOT NULL
);

INSERT INTO `my_table` VALUES (1,{\"some\":\"data\"});
borisershov commented 1 year ago

Thanks for you case! Fixed in new commit within same branch. Please check changes.

akademic commented 1 year ago

Now I can import anonimized dump without errors. Hope it is correct. 👍

borisershov commented 1 year ago

Excellent! New version v1.2.0 was released. So I close the issue. Thanks for your collaboration 🤝