rap2hpoutre / pg-anonymizer

Dump anonymized PostgreSQL database with a NodeJS CLI
https://raph.site
MIT License
228 stars 32 forks source link

Allow for passthrough of extra args to pg_dump #26

Closed alexhall closed 2 years ago

alexhall commented 2 years ago

I started off looking for a way to restrict the anonymized database dump to a particular schema, which pg_dump already supports via the -n flag. While we could conceivably define an equivalent pg-anonymizer flag and pass it through, the pg_dump tool already supports a rich variety of CLI flags that help control what objects get dumped and we wouldn't want to have to define separate flags for each one that somebody might want to use.

A more flexible approach, as implemented here, could be to switch from a single, named positional arg (a database connection string) to instead use variable-length arguments which we can then pass through to pg_dump (after sanitizing to ensure that we aren't breaking pg-anonymizer by altering the dump format or writing to a file rather than stdout).

By setting the strict = false flag, we instruct he OCLIF arg parser to gather up any unrecognized flags as positional arguments which are then exposed in argv. You can also force the issue by using the special -- argument to separate flags from arguments a la GNU getopt. So the following commands are equivalent and both result in passing -n myschema mydb to pg_dump:

$ pg-anonymizer -n myschema -l first_name:faker.name.firstName mydb
$ pg-anonymizer -l first_name:faker.name.firstName -- -n myschema mydb

I believe this change to be fully backwards compatible with the existing CLI-parsing behavior, but in the absence of a test suite I'm not sure what the best way of demonstrating that might be.

alexhall commented 2 years ago

This is a potential fix for https://github.com/rap2hpoutre/pg-anonymizer/issues/23

alexhall commented 2 years ago

Hi! Just wanted to see if there are any questions I can answer or anything else I can do to help get this PR reviewed.

rap2hpoutre commented 2 years ago

Hi @alexhall! Thank you for your contribution and sorry for the delay. Your addition seems legit and I guess it's a good move.

I believe this change to be fully backwards compatible with the existing CLI-parsing behavior, but in the absence of a test suite I'm not sure what the best way of demonstrating that might be.

Oops sorry about that!

The only thing I want to be sure is: will this command (the default command) still work? (TBH I don't remember how to test a CLI without publishing it so I prefer asking)

npx pg-anonymizer postgres://user:secret@localhost:1234/mydb -o dump.sql
alexhall commented 2 years ago

Hi @rap2hpoutre, thanks for getting back to me!

The only thing I want to be sure is: will this command (the default command) still work? (TBH I don't remember how to test a CLI without publishing it so I prefer asking)

npx pg-anonymizer postgres://user:secret@localhost:1234/mydb -o dump.sql

Yes, I can confirm that this still works. Here's a simple test run to demonstrate:

~/src/pg-anonymizer (add-pgdump-args) > createdb testdb
~/src/pg-anonymizer (add-pgdump-args) > psql testdb -c "create table people(name text, email text); insert into people (name, email) values ('John Doe', 'jdoe@example.com');"
INSERT 0 1
~/src/pg-anonymizer (add-pgdump-args) > bin/run postgres://alex:**********@localhost/testdb -o dump.sql
Launching pg_dump
Command pg_dump started, running anonymization.
Output file: dump.sql
Anonymizing table public.people
Columns to anonymize: name, email
~/src/pg-anonymizer (add-pgdump-args) > cat dump.sql 
--
-- PostgreSQL database dump
--

-- Dumped from database version 12.10 (Ubuntu 12.10-1.pgdg20.04+1)
-- Dumped by pg_dump version 12.10 (Ubuntu 12.10-1.pgdg20.04+1)

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

SET default_tablespace = '';

SET default_table_access_method = heap;

--
-- Name: people; Type: TABLE; Schema: public; Owner: alex
--

CREATE TABLE public.people (
    name text,
    email text
);

ALTER TABLE public.people OWNER TO alex;

--
-- Data for Name: people; Type: TABLE DATA; Schema: public; Owner: alex
--

COPY public.people (name, email) FROM stdin;
Samuel Bogisich Marcelo84@yahoo.com
\.

--
-- PostgreSQL database dump complete
--
GeekOnCoffee commented 2 years ago

This would be a huge win for our use!

rap2hpoutre commented 2 years ago

Thank you for your contribution (and sorry for the late answer!)

github-actions[bot] commented 2 years ago

:tada: This PR is included in version 0.6.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: