rheinwerk-verlag / pganonymize

A commandline tool for anonymizing PostgreSQL databases
http://pganonymize.readthedocs.io/
Other
42 stars 26 forks source link

Remove column specification for cursor.copy_from call #25

Closed nurikk closed 3 years ago

nurikk commented 3 years ago

Hi! This commit causes error on my db (aws rds, default config, PostgreSQL 12.6)

And if I revert this commit, it will definitely cause problems discussed here

So, obvious solution is to remove this parameter at all :)

copy_from doc

columns – iterable with name of the columns to import. The length and types should match the content of the file to read. If not specified, it is assumed that the entire table matches the file structure.

We can be sure that data matches table structure, because we're using DictCursor which underlying uses OrderedDict

nurikk commented 3 years ago
INFO: Found table definition "test123"
Anonymizing |████████████████████████████████| 1/1
Traceback (most recent call last):
  File "/Users/nur/code/anonymiser/.env/bin/pganonymize", line 8, in <module>
    sys.exit(main())
  File "/Users/nur/code/anonymiser/.env/lib/python3.9/site-packages/pganonymizer/__main__.py", line 10, in main
    main()
  File "/Users/nur/code/anonymiser/.env/lib/python3.9/site-packages/pganonymizer/cli.py", line 71, in main
    anonymize_tables(connection, schema.get('tables', []), verbose=args.verbose)
  File "/Users/nur/code/anonymiser/.env/lib/python3.9/site-packages/pganonymizer/utils.py", line 43, in anonymize_tables
    import_data(connection, column_dict, table_name, table_columns, primary_key, data)
  File "/Users/nur/code/anonymiser/.env/lib/python3.9/site-packages/pganonymizer/utils.py", line 154, in import_data
    copy_from(connection, data, temp_table, table_columns)
  File "/Users/nur/code/anonymiser/.env/lib/python3.9/site-packages/pganonymizer/utils.py", line 132, in copy_from
    cursor.copy_from(new_data, table, sep=COPY_DB_DELIMITER, null='\\N', columns=quoted_cols)
psycopg2.errors.UndefinedColumn: column ""id"" of relation "tmp_test123" does not exist
tables:
 - test123:
    primary_key: id
    chunk_size: 5000
    fields:
      - FOO_BAR:
          provider:
            name: set
            value: 42
> \d test123
               Table "public.test123"
 Column  |  Type   | Collation | Nullable | Default
---------+---------+-----------+----------+---------
 id      | integer |           |          |
 FOO_BAR | integer |           |          |
hkage commented 3 years ago

Hi, thanks again for your contributions and the effort you put into this project. I really appreciate your work.

Good idea and works like a charm. I also tested it against a full productive database copy and a test database containing uppercase columns.