Closed agitator closed 3 years ago
Problem was from the interrupted zodbconvert. Recreated the db and worked. Also quite a bit faster with max_locks_per_transaction = 512
Any configuration recommendations for Postgres with Relstorage?
There is no released version of RelStorage that uses advisory locks, and that's the only thing that max_locks_per_transaction
controls, so I can't see why it would need to be increased. The number of available advisory locks is calculated as max_locks_per_transaction
max_connections
which by default is 64 100 = 6,400.
An upcoming version of RelStorage might use advisory locks, in which case some large sites might want to increase those values. I cover that and other things in updated configuration notes.
The only time I recall encountering 'out of shared memory' errors was because of an operating system configuration issue. For example, using a docker image, you typically need something like --shm-size=256MB
.
Thx for the explanation :-) So the type of locking is rather important and the upcoming improvements should give a speedup with zodbconvert?
In my experience, zodbconvert is very fast. We recently converted large production databases from MySQL to PostgreSQL using zodbconvert and it actually went much faster than expected. And that was on a largely untuned target server. If that's not what you're experiencing, I don't know what to tell you, except to look at the general deployment conditions of your server and do the general RDBMS troubleshooting/tuning steps.
The server admin made some optimisations on the Postgres
max_locks_per_transaction = 512
max_connections = 100
work_mem = 8MB
shared_buffers = 4GB
wal_level = replica
max_wal_senders = 0
wal_compression = on
max_wal_size = 2GB
synchronous_commit = off
And I rerun zodbconvert again
$ ./bin/zodbconvert --clear --debug restore-to-relstorage.cfg
2021-06-10 17:43:31,439 [relstorage._util] DEBUG Using value 0.31 from environ 'RS_PERF_LOG_TRACE_MIN'=0.31 (default=0.31)
2021-06-10 17:43:31,439 [relstorage._util] DEBUG Using value 1.24 from environ 'RS_PERF_LOG_DEBUG_MIN'=1.24 (default=1.24)
2021-06-10 17:43:31,439 [relstorage._util] DEBUG Using value 3.03 from environ 'RS_PERF_LOG_INFO_MIN'=3.03 (default=3.03)
2021-06-10 17:43:31,440 [relstorage._util] DEBUG Using value 9.24 from environ 'RS_PERF_LOG_WARNING_MIN'=9.24 (default=9.24)
2021-06-10 17:43:31,440 [relstorage._util] DEBUG Using value 20.1 from environ 'RS_PERF_LOG_ERROR_MIN'=20.1 (default=20.1)
2021-06-10 17:43:31,440 [relstorage._util] DEBUG Using value WARN from environ 'RS_PERF_LOG_DETAILS_LEVEL'='WARN' (default='WARN')
2021-06-10 17:43:31,440 [relstorage._util] DEBUG Using value on from environ 'RS_PERF_LOG_ENABLE'='on' (default='on')
2021-06-10 17:43:31,575 [relstorage._util] DEBUG Using value None from environ 'RS_CACHE_POLL_TIMEOUT'=None (default=None)
2021-06-10 17:43:31,577 [relstorage._util] DEBUG Using value None from environ 'RS_MAX_POOLED_STORE_CONNECTIONS'=None (default=None)
2021-06-10 17:43:31,604 [ZODB.BaseStorage] DEBUG create storage ./vartransfer/var/filestorage/Data.fs
2021-06-10 17:43:31,619 [ZODB.blob] DEBUG (6075) Blob directory `./vartransfer/var/blobstorage` has layout marker set. Selected `bushy` layout.
2021-06-10 17:43:31,637 [relstorage._util] DEBUG Using value 4096 from environ 'RS_PACK_CURSOR_ARRAYSIZE'=4096 (default=4096)
2021-06-10 17:43:31,638 [relstorage._util] DEBUG Using value 4096 from environ 'RS_PACK_STORE_BATCH_SIZE'=4096 (default=4096)
2021-06-10 17:43:31,638 [relstorage._util] DEBUG Using value 120 from environ 'RS_PACK_COMMIT_FREQUENCY'=120 (default=120)
2021-06-10 17:43:31,638 [relstorage._util] DEBUG Using value 1024 from environ 'RS_PACK_DOWNLOAD_BATCH_SIZE'=1024 (default=1024)
2021-06-10 17:43:31,638 [relstorage._util] DEBUG Using value 1000 from environ 'RS_PACK_HP_DELETE_BATCH_SIZE'=1000 (default=1000)
2021-06-10 17:43:31,666 [relstorage.adapters.drivers] DEBUG Attempting to load driver named 'gevent psycopg2' from 'psycopg2' failed; if no driver was specified, or the driver was set to 'auto', there may be more drivers to attempt.
Traceback (most recent call last):
File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/drivers.py", line 147, in __init__
self.driver_module = mod = self.get_driver_module()
File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/postgresql/drivers/psycopg2.py", line 271, in get_driver_module
__import__('gevent')
ModuleNotFoundError: No module named 'gevent'
2021-06-10 17:43:31,702 [relstorage.adapters.drivers] DEBUG Using driver <relstorage.adapters.postgresql.drivers.psycopg2.Psycopg2Driver object at 0x7f63448e9fd0> for requested name 'auto'
2021-06-10 17:43:31,702 [relstorage.adapters.schema] DEBUG Reading stored procedures from /home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/postgresql/procs and /home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/postgresql/procs/hp
2021-06-10 17:43:31,785 [relstorage.adapters.postgresql.drivers] DEBUG Message from the RDBMS: NOTICE: relation "zoid_seq" already exists, skipping
2021-06-10 17:43:31,801 [zodbconvert] INFO Storages opened successfully.
2021-06-10 17:43:31,801 [zodbconvert] INFO Clearing old data...
2021-06-10 17:43:31,810 [relstorage.adapters.schema] DEBUG Before zapping existing tables ({'pg_largeobject_metadata', 'pg_aggregate', 'pg_constraint', 'pg_cast', 'pg_attribute', 'sql_features', 'pg_auth_members', 'pg_db_role_setting', 'pg_extension', 'pack_state_tid', 'pg_amproc', 'pg_inherits', 'pg_shdescription', 'pack_object', 'pg_seclabel', 'pg_type', 'pg_attrdef', 'pg_partitioned_table', 'pg_statistic_ext_data', 'pg_subscription_rel', 'pg_foreign_data_wrapper', 'pg_namespace', 'pg_ts_template', 'sql_implementation_info', 'pg_conversion', 'pg_ts_config_map', 'blob_chunk', 'pg_default_acl', 'pg_statistic', 'pg_policy', 'pg_depend', 'pg_ts_parser', 'pg_sequence', 'pg_publication_rel', 'pg_statistic_ext', 'pg_largeobject', 'pg_foreign_server', 'pg_opfamily', 'pg_ts_config', 'pg_subscription', 'pg_am', 'transaction', 'pg_user_mapping', 'pg_proc', 'sql_sizing', 'pg_shseclabel', 'pg_operator', 'pg_publication', 'pg_description', 'pg_collation', 'sql_parts', 'pg_ts_dict', 'pg_class', 'object_refs_added', 'pg_authid', 'current_object', 'pg_amop', 'pg_shdepend', 'pg_range', 'pg_trigger', 'pg_language', 'pg_opclass', 'pg_tablespace', 'pg_event_trigger', 'pg_init_privs', 'commit_row_lock', 'pg_index', 'pg_database', 'pack_state', 'pg_replication_origin', 'pg_transform', 'pg_foreign_table', 'pg_rewrite', 'object_ref', 'pg_enum', 'object_state'}) with <cursor object at 0x7f6344977d60; closed: 0>; slow: False
Traceback (most recent call last):
File "./bin/zodbconvert", line 336, in <module>
sys.exit(relstorage.zodbconvert.main())
File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/zodbconvert.py", line 161, in main
destination.zap_all()
File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/storage/__init__.py", line 350, in zap_all
self._adapter.schema.zap_all(**kwargs)
File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/schema.py", line 857, in zap_all
self.connmanager.open_and_call(zap_all)
File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/connmanager.py", line 292, in open_and_call
res = callback(conn, cursor)
File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/schema.py", line 832, in zap_all
self._before_zap_all_tables(cursor, existent, slow)
File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/postgresql/schema.py", line 360, in _before_zap_all_tables
cursor.execute("""
psycopg2.errors.OutOfMemory: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Seems (at least to me) that there is rather a problem with --clear
on a db with existing data than the actual Postgres config?
The machine after restart and clicking around on the Plone site a bit.
Tasks: 90 total, 5 running, 85 sleeping, 0 stopped, 0 zombie
%Cpu(s): 16.4 us, 0.0 sy, 0.0 ni, 83.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16384.0 total, 5694.7 free, 10505.8 used, 183.5 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 5878.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
380 postgres 20 0 4407108 133056 131168 S 0.0 0.8 0:00.13 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13+
438 postgres 20 0 4407108 5800 3908 S 0.0 0.0 0:00.00 postgres: 13/main: checkpointer
439 postgres 20 0 4407108 14388 12500 S 0.0 0.1 0:00.01 postgres: 13/main: background writer
440 postgres 20 0 4407108 5728 3832 S 0.0 0.0 0:00.00 postgres: 13/main: walwriter
441 postgres 20 0 4407656 7916 5724 S 0.0 0.0 0:00.00 postgres: 13/main: autovacuum launcher
442 postgres 20 0 68552 5604 3572 S 0.0 0.0 0:00.00 postgres: 13/main: stats collector
444 postgres 20 0 4407536 6592 4500 S 0.0 0.0 0:00.00 postgres: 13/main: logical replication launcher
1508 postgres 20 0 4408488 24204 21100 S 0.0 0.1 0:00.01 postgres: 13/main: zodbuser zodb ::1(43964) idle
1509 postgres 20 0 4408488 24108 21000 S 0.0 0.1 0:00.01 postgres: 13/main: zodbuser zodb ::1(43966) idle
1514 postgres 20 0 4408488 24044 20936 S 0.0 0.1 0:00.01 postgres: 13/main: zodbuser zodb ::1(43972) idle
While clearing, SELECT
goes up to about 26% of MEM before crashing
top - 18:25:25 up 20 min, 2 users, load average: 2.11, 1.05, 1.00
Tasks: 106 total, 2 running, 104 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.3 us, 0.0 sy, 0.0 ni, 95.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16384.0 total, 1258.1 free, 10851.5 used, 4274.5 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 5532.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2926 postgres 20 0 4469936 4.2g 4.1g R 84.4 26.0 0:40.06 postgres: 13/main: zodbuser zodb ::1(46470) SELECT
That's after the failed zodbconvert
top - 18:19:42 up 14 min, 2 users, load average: 0.52, 1.24, 1.09
Tasks: 96 total, 1 running, 95 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.0 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16384.0 total, 1374.1 free, 10735.7 used, 4274.2 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 5648.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
380 postgres 20 0 4407108 133056 131168 S 0.0 0.8 0:00.17 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13+
438 postgres 20 0 4407208 4.1g 4.1g S 0.0 25.4 0:46.93 postgres: 13/main: checkpointer
439 postgres 20 0 4407108 295968 294040 S 0.0 1.8 0:00.69 postgres: 13/main: background writer
440 postgres 20 0 4407108 22360 20448 S 0.0 0.1 0:09.81 postgres: 13/main: walwriter
441 postgres 20 0 4407656 7916 5724 S 0.0 0.0 0:00.01 postgres: 13/main: autovacuum launcher
442 postgres 20 0 68552 5604 3572 S 0.0 0.0 0:00.14 postgres: 13/main: stats collector
444 postgres 20 0 4407536 6592 4500 S 0.0 0.0 0:00.00 postgres: 13/main: logical replication launcher
1508 postgres 20 0 4408488 24204 21100 S 0.0 0.1 0:00.01 postgres: 13/main: zodbuser zodb ::1(43964) idle
1509 postgres 20 0 4409260 31480 28132 S 0.0 0.2 0:00.07 postgres: 13/main: zodbuser zodb ::1(43966) idle
1514 postgres 20 0 4409260 34848 31500 S 0.0 0.2 0:00.08 postgres: 13/main: zodbuser zodb ::1(43972) idle
1724 postgres 20 0 4409224 78736 74972 S 0.0 0.5 0:01.05 postgres: 13/main: zodbuser zodb ::1(44132) idle
1725 postgres 20 0 4409096 30988 27704 S 0.0 0.2 0:00.02 postgres: 13/main: zodbuser zodb ::1(44134) idle
1726 postgres 20 0 4409096 31020 27736 S 0.0 0.2 0:00.04 postgres: 13/main: zodbuser zodb ::1(44136) idle
1727 postgres 20 0 4408708 27844 24612 S 0.0 0.2 0:00.01 postgres: 13/main: zodbuser zodb ::1(44138) idle
1728 postgres 20 0 4409260 31496 28212 S 0.0 0.2 0:00.04 postgres: 13/main: zodbuser zodb ::1(44140) idle
1729 postgres 20 0 4409232 80280 76404 S 0.0 0.5 0:01.35 postgres: 13/main: zodbuser zodb ::1(44142) idle
I recreate an empty db for now and try again but does this give any clue what other settings could be optimised?
Thanks, that's very helpful.
What's going on here is that we're clearing away the old blobs (one ZODB Blob
object becomes one RDBMS BLOB
). PostgreSQL has a unique way of storing blobs, where they are essentially separate from the actual database: there's not a row in a table you can access to read or delete a blob. Instead, it gives you an integer you use to refer to a chunk of data, and you use functions like lo_open(int)
, lo_write(int)
and lo_read(int)
to deal with the data. You have to store that integer to be able to refer to the data; if you lose the integer, you basically "leak" the data. When you're done with the data, you must call lo_unlink(int)
to actually remove the data.
(By "separate from the actual database", I mean that the data is hidden away in internal PostgreSQL tables that aren't normally accessed directly.)
In normal circumstances, calling lo_unlink
is done automatically with a trigger: when the ZODB blob is deleted, the row holding the PostgreSQL integer is deleted, the trigger calls lo_unlink
, and all is right with the world.
But "zapping" the database (what clear does) doesn't delete row-by-row, because that's horribly slow. Instead, it uses TRUNCATE
to very quickly clear the tables. TRUNCATE
doesn't run triggers, though, so we have to manually call lo_unlink
for each integer first.
Today I learned that lo_unlink
doesn't use normal row-level-locking like you might expect. You guessed it, instead of row locks, it uses advisory locks. So for zapping a database with many existing blobs, max_locks_per_transaction
is relevant, and for databases with lots of blobs (you must have more than 512,000 blobs) it may not even be possible without increasing it.
I'm looking into alternative ways to handle this. In the meantime, I have a couple options:
max_locks_…
further. Get the number of rows in blob_chunk
(SELECT COUNT(*) FROM blob_chunk
), divide that by max_connections
, and you get the value you need for max_locks_per_transaction
(add some more to be safe).TRUNCATE TABLE blob_chunk
and then run the vacuumlo
command that comes with PostgreSQL to clean up the blobs:$ psql -d relstoragetest_hf -h <thehost> -p 54322 -U relstoragetest -c 'TRUNCATE TABLE blob_chunk'
TRUNCATE TABLE
$ vacuumlo -n relstoragetest_hf -h <thehost> -p 54322 -U relstoragetest
Connected to database "relstoragetest_hf"
Test run: no large objects will be removed!
Checking chunk in public.blob_chunk
Would remove 340500 large objects from database "relstoragetest_hf".
68239/68242,100.00%, 11.43 MB/s 38.02 TX/s, 20510.18 MB 29.9 minutes
Generous 3000 max_locks_per_transaction
did the trick :-)
I "got" a managed server to install a Plone 5.2 site with Relstorage. System is a Debian Buster and Postgres 13.2
The Data.fs is 2.7G and 15G of blobfiles
Yesterday i started a zodbconvert with this config from the buildout directory
After about an hour of running only a handful of objects were converted and stopped the process then.
Today I tried to convert again but now running into
psycopg2.errors.OutOfMemory: out of shared memory
hinting toYou might need to increase max_locks_per_transaction.
Since Debian/Postgres settings seem rather conservative by default, what would be some kind of recommendation for a DB like that?
Sidenote: Packed the Data.fs to zero days and did a zodbupdate for Python 3 before importing.
The full traceback with
--debug