kibitan / pgmasking

Command line tool for generating anonymized database for PostgreSQL (WIP)
MIT License
6 stars 0 forks source link
command-line-tool database gdpr golang postgres postgresql rdbms

pgMasKING🤴

build Acceptance Test Status Maintainability codecov

This project is currently Work in Progress

The command line tool for anonymizing PostgreSQL database records by parsing a SQL dump file and build a new SQL dump file with masking sensitive/credential data.

for MySQL: MasKING

Installation

(TBC)

Requirement for development

Supported RDBMS

Usage

  1. Setup configuration for anonymizing target tables/columns to pgmasking.yml

      # table_name:
      #   column_name: masked_value
    
      users:
        string: anonymized string
        email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number
        integer: 12345
        float: 123.45
        boolean: true
        null_column: null
        date: 2018-08-24
        time: 2018-08-24 15:54:06
        binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html
          R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5
          OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+
          +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC
          AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=

    A value will be implicitly converted to a compatible type. If you prefer to explicitly convert, you could use a tag as defined in YAML Version 1.1

      not-date: !!str 2002-04-28

    NOTE: pgMasKING doesn't check actual schema's type from the dump. If you put incompatible value, it will cause an error during restoring to the database.

  2. Dump database with anonymizing

    (TBC) pgMasKING work with pg_dump command. It doesn't (or only) work with --column-inserts/--attribute-inserts/--rows-per-insert=n(version 12~) options.

      pg_dump DATABASE_NAME | pgmasking > anonymized_dump.sql

    or

      pg_dump DATABASE_NAME --column-inserts --rows-per-insert=100 | pgmasking > anonymized_dump.sql
  3. Restore from the anonymized dump file

      psql ANONYMIZED_DATABASE_NAME < anonymized_dump.sql

    Tip: If you don't need to have an anonymized dump file, you can directly insert it from the stream. It can be faster because it has less IO interaction.

        pg_dump DATABASE_NAME | pgmasking | psql ANONYMIZED_DATABASE_NAME

options

(WIP)

$ pgmasking -h
Usage: pgmasking [options]
    -c, --config=FILE_PATH           specify config file. default: pgmasking.yml
    -v, --version                    version

Use case of anonymized (production) database

Read here

Development

git clone git@github.com:kibitan/pgmasking.git

boot

  go run .

or

  go build .
  ./pgmasking

or

  go install .
  pgmasking

Run test

  go test -race -v ./...

Lint

  golint ./... && go vet ./...

Document

(TBC)

  go get golang.org/x/tools/cmd/godoc
  go doc pgmasking

  godoc // boot http server
  open http://localhost:6060

acceptance test

./acceptance/run_test.sh

available option via environment variable:

with docker-compose
docker-compose/acceptance_test.sh postgres12

The docker-compose file names for other database versions, specify that file.

Development with Docker

docker build . -t pgmasking
echo "sample stdout" | docker run -i pgmasking
docker run pgmasking -v

run test

docker build . --target builder -t pgmasking-builder
docker run pgmasking-builder go test -v ./..

# with mounting disk
docker run --mount src=`pwd`,target=/go/src/app,type=bind pgmasking-builder go test -v ./..

Profiling

(TBC)

Benchmark

(TBC)

Design Concept

Read here

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/kibitan/pgmasking. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the pgMasKING project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.