Closed nurikk closed 3 years ago
Hi,
I tested your changes against a ~1,6 GB sized database and tables with at most ~370k data entries (per table). The anonymization breaks at some point with an OutOfMemory
error. Have you discovered similar issues?
Im an running the anonymization with Python 3.9 in a virtualenv and on a Ubuntu Linux 21.04. with 24 GB of RAM.
postgres=# SELECT version();
version
------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 10.7 (Ubuntu 10.7-0ubuntu0.18.10.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 8.2.0-7ubuntu1) 8.2.0, 64-bit
(1 row)
Anonymizing |█ | 10000/292059Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/__main__.py", line 18, in <module>
main()
File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/__main__.py", line 10, in main
main()
File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/cli.py", line 71, in main
anonymize_tables(connection, schema.get('tables', []), verbose=args.verbose)
File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 40, in anonymize_tables
build_and_then_import_data(connection, table_name, primary_key, columns, excludes,
File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 83, in build_and_then_import_data
import_data(connection, columns, table, primary_key, data)
File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 158, in import_data
cursor.execute('DROP TABLE %s;' % temp_table)
psycopg2.errors.OutOfMemory: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Hi,
I tested your changes against a ~1,6 GB sized database and tables with at most ~370k data entries (per table). The anonymization breaks at some point with an
OutOfMemory
error. Have you discovered similar issues?Im an running the anonymization with Python 3.9 in a virtualenv and on a Ubuntu Linux 21.04. with 24 GB of RAM.
postgres=# SELECT version(); version ------------------------------------------------------------------------------------------------------------------------------ PostgreSQL 10.7 (Ubuntu 10.7-0ubuntu0.18.10.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 8.2.0-7ubuntu1) 8.2.0, 64-bit (1 row)
Anonymizing |█ | 10000/292059Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/__main__.py", line 18, in <module> main() File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/__main__.py", line 10, in main main() File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/cli.py", line 71, in main anonymize_tables(connection, schema.get('tables', []), verbose=args.verbose) File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 40, in anonymize_tables build_and_then_import_data(connection, table_name, primary_key, columns, excludes, File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 83, in build_and_then_import_data import_data(connection, columns, table, primary_key, data) File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 158, in import_data cursor.execute('DROP TABLE %s;' % temp_table) psycopg2.errors.OutOfMemory: out of shared memory HINT: You might need to increase max_locks_per_transaction.
No, i'm not getting such issues. Without proposed changes I assume everything works fine?
Hi, I tested your changes against a ~1,6 GB sized database and tables with at most ~370k data entries (per table). The anonymization breaks at some point with an
OutOfMemory
error. Have you discovered similar issues? Im an running the anonymization with Python 3.9 in a virtualenv and on a Ubuntu Linux 21.04. with 24 GB of RAM.postgres=# SELECT version(); version ------------------------------------------------------------------------------------------------------------------------------ PostgreSQL 10.7 (Ubuntu 10.7-0ubuntu0.18.10.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 8.2.0-7ubuntu1) 8.2.0, 64-bit (1 row)
Anonymizing |█ | 10000/292059Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/__main__.py", line 18, in <module> main() File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/__main__.py", line 10, in main main() File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/cli.py", line 71, in main anonymize_tables(connection, schema.get('tables', []), verbose=args.verbose) File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 40, in anonymize_tables build_and_then_import_data(connection, table_name, primary_key, columns, excludes, File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 83, in build_and_then_import_data import_data(connection, columns, table, primary_key, data) File "/home/henning/Projekte/postgresql-anonymizer/pganonymizer/utils.py", line 158, in import_data cursor.execute('DROP TABLE %s;' % temp_table) psycopg2.errors.OutOfMemory: out of shared memory HINT: You might need to increase max_locks_per_transaction.
No, i'm not getting such issues. Without proposed changes I assume everything works fine?
Yes, as soon as I change to the development branch the anonymization works fine and completes after a couple of minutes.
I haven't take a deeper look into the max_locks_per_transaction
settings from PostgreSQL, but it seems like the number of locks during the anonymization exceeds PostgreSQL's default. I tried to split the cursor operations into multiple transactions, but that resulted in errors when deleting the temporary tables (at https://github.com/rheinwerk-verlag/postgresql-anonymizer/blob/development/pganonymizer/utils.py#L157).
I'll fix this issue
I'll fix this issue
Thanks. I can support you with additional debug / system information if that helps you.
Hm, I can't reproduce it on my test db
Can you try to test against this pr? It's already includes changes from this PR and might resolves psycopg2.errors.OutOfMemory: out of shared memory
issue
This pr fixes issue with processing huge tables. It calls
import_data
inside loop that callsfetchmany
and clearsdata
and StringIO buffers after each iterationDepends on https://github.com/rheinwerk-verlag/postgresql-anonymizer/pull/25