Postgresql config recommendations for Relstorage

agitator commented 3 years ago

I "got" a managed server to install a Plone 5.2 site with Relstorage. System is a Debian Buster and Postgres 13.2

The Data.fs is 2.7G and 15G of blobfiles

Yesterday i started a zodbconvert with this config from the buildout directory

<filestorage source>
        #zeo data
        path ./vartransfer/var/filestorage/Data.fs
        #blobs
        blob-dir ./vartransfer/var/blobstorage
        read-only true
</filestorage>

<relstorage destination>
        # ZODB Cache Dir
        blob-dir ./var/blobstorage
        # db connect
        <postgresql>
        dsn ...
        </postgresql>
</relstorage>

After about an hour of running only a handful of objects were converted and stopped the process then.

Today I tried to convert again but now running into psycopg2.errors.OutOfMemory: out of shared memory hinting to You might need to increase max_locks_per_transaction.

Since Debian/Postgres settings seem rather conservative by default, what would be some kind of recommendation for a DB like that?

Sidenote: Packed the Data.fs to zero days and did a zodbupdate for Python 3 before importing.

The full traceback with --debug

$ ./bin/zodbconvert --clear --debug restore-to-relstorage.cfg 
2021-05-06 17:02:50,214 [relstorage._util] DEBUG  Using value 0.31 from environ 'RS_PERF_LOG_TRACE_MIN'=0.31 (default=0.31)
2021-05-06 17:02:50,214 [relstorage._util] DEBUG  Using value 1.24 from environ 'RS_PERF_LOG_DEBUG_MIN'=1.24 (default=1.24)
2021-05-06 17:02:50,215 [relstorage._util] DEBUG  Using value 3.03 from environ 'RS_PERF_LOG_INFO_MIN'=3.03 (default=3.03)
2021-05-06 17:02:50,215 [relstorage._util] DEBUG  Using value 9.24 from environ 'RS_PERF_LOG_WARNING_MIN'=9.24 (default=9.24)
2021-05-06 17:02:50,215 [relstorage._util] DEBUG  Using value 20.1 from environ 'RS_PERF_LOG_ERROR_MIN'=20.1 (default=20.1)
2021-05-06 17:02:50,215 [relstorage._util] DEBUG  Using value WARN from environ 'RS_PERF_LOG_DETAILS_LEVEL'='WARN' (default='WARN')
2021-05-06 17:02:50,215 [relstorage._util] DEBUG  Using value on from environ 'RS_PERF_LOG_ENABLE'='on' (default='on')
2021-05-06 17:02:50,378 [relstorage._util] DEBUG  Using value None from environ 'RS_CACHE_POLL_TIMEOUT'=None (default=None)
2021-05-06 17:02:50,380 [relstorage._util] DEBUG  Using value None from environ 'RS_MAX_POOLED_STORE_CONNECTIONS'=None (default=None)
2021-05-06 17:02:50,400 [ZODB.BaseStorage] DEBUG  create storage ./vartransfer/var/filestorage/Data.fs
2021-05-06 17:02:50,425 [ZODB.blob] DEBUG  (1278) Blob directory `./vartransfer/var/blobstorage` has layout marker set. Selected `bushy` layout. 
2021-05-06 17:02:50,439 [relstorage._util] DEBUG  Using value 4096 from environ 'RS_PACK_CURSOR_ARRAYSIZE'=4096 (default=4096)
2021-05-06 17:02:50,439 [relstorage._util] DEBUG  Using value 4096 from environ 'RS_PACK_STORE_BATCH_SIZE'=4096 (default=4096)
2021-05-06 17:02:50,439 [relstorage._util] DEBUG  Using value 120 from environ 'RS_PACK_COMMIT_FREQUENCY'=120 (default=120)
2021-05-06 17:02:50,439 [relstorage._util] DEBUG  Using value 1024 from environ 'RS_PACK_DOWNLOAD_BATCH_SIZE'=1024 (default=1024)
2021-05-06 17:02:50,439 [relstorage._util] DEBUG  Using value 1000 from environ 'RS_PACK_HP_DELETE_BATCH_SIZE'=1000 (default=1000)
2021-05-06 17:02:50,458 [relstorage.adapters.drivers] DEBUG  Attempting to load driver named 'gevent psycopg2' from 'psycopg2' failed; if no driver was specified, or the driver was set to 'auto', there may be more drivers to attempt.
Traceback (most recent call last):
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/adapters/drivers.py", line 147, in __init__
    self.driver_module = mod = self.get_driver_module()
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/adapters/postgresql/drivers/psycopg2.py", line 271, in get_driver_module
    __import__('gevent')
ModuleNotFoundError: No module named 'gevent'
2021-05-06 17:02:50,499 [relstorage.adapters.drivers] DEBUG  Using driver <relstorage.adapters.postgresql.drivers.psycopg2.Psycopg2Driver object at 0x7efd1b1315c0> for requested name 'auto'
2021-05-06 17:02:50,500 [relstorage.adapters.schema] DEBUG  Reading stored procedures from /home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/adapters/postgresql/procs and /home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/adapters/postgresql/procs/hp
2021-05-06 17:02:50,582 [relstorage.adapters.postgresql.drivers] DEBUG  Message from the RDBMS: NOTICE:  relation "zoid_seq" already exists, skipping
2021-05-06 17:02:50,597 [zodbconvert] INFO   Storages opened successfully.
2021-05-06 17:02:50,597 [zodbconvert] INFO   Clearing old data...
2021-05-06 17:02:50,608 [relstorage.adapters.schema] DEBUG  Before zapping existing tables ({'pg_policy', 'object_state', 'pg_ts_template', 'pg_class', 'pg_constraint', 'pg_ts_config', 'pg_rewrite', 'pg_aggregate', 'pg_cast', 'pg_range', 'pg_subscription', 'pg_statistic', 'sql_features', 'pg_extension', 'pg_language', 'pg_db_role_setting', 'pg_ts_dict', 'object_refs_added', 'pg_shseclabel', 'commit_row_lock', 'pack_state', 'pack_object', 'pg_publication_rel', 'pg_conversion', 'pg_shdescription', 'pg_replication_origin', 'pg_opclass', 'pg_statistic_ext', 'pg_enum', 'pg_transform', 'pg_auth_members', 'object_ref', 'pg_amproc', 'pg_index', 'pg_opfamily', 'pg_depend', 'pg_statistic_ext_data', 'sql_parts', 'pg_ts_config_map', 'pg_description', 'pg_type', 'blob_chunk', 'pg_subscription_rel', 'pg_authid', 'pg_proc', 'pg_sequence', 'pg_foreign_data_wrapper', 'pg_partitioned_table', 'sql_sizing', 'current_object', 'pg_inherits', 'pg_ts_parser', 'pg_amop', 'sql_implementation_info', 'transaction', 'pg_foreign_table', 'pg_am', 'pg_user_mapping', 'pg_shdepend', 'pg_foreign_server', 'pack_state_tid', 'pg_tablespace', 'pg_init_privs', 'pg_largeobject', 'pg_attribute', 'pg_trigger', 'pg_publication', 'pg_namespace', 'pg_event_trigger', 'pg_collation', 'pg_attrdef', 'pg_operator', 'pg_largeobject_metadata', 'pg_database', 'pg_default_acl', 'pg_seclabel'}) with <cursor object at 0x7efd1a379048; closed: 0>; slow: False
Traceback (most recent call last):
  File "./bin/zodbconvert", line 337, in <module>
    sys.exit(relstorage.zodbconvert.main())
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/zodbconvert.py", line 161, in main
    destination.zap_all()
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/storage/__init__.py", line 350, in zap_all
    self._adapter.schema.zap_all(**kwargs)
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/adapters/schema.py", line 857, in zap_all
    self.connmanager.open_and_call(zap_all)
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/adapters/connmanager.py", line 292, in open_and_call
    res = callback(conn, cursor)
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/adapters/schema.py", line 832, in zap_all
    self._before_zap_all_tables(cursor, existent, slow)
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.7-linux-x86_64.egg/relstorage/adapters/postgresql/schema.py", line 365, in _before_zap_all_tables
    """)
psycopg2.errors.OutOfMemory: out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

agitator commented 3 years ago

Problem was from the interrupted zodbconvert. Recreated the db and worked. Also quite a bit faster with max_locks_per_transaction = 512

Any configuration recommendations for Postgres with Relstorage?

jamadden commented 3 years ago

There is no released version of RelStorage that uses advisory locks, and that's the only thing that max_locks_per_transaction controls, so I can't see why it would need to be increased. The number of available advisory locks is calculated as max_locks_per_transaction max_connections which by default is 64 100 = 6,400.

An upcoming version of RelStorage might use advisory locks, in which case some large sites might want to increase those values. I cover that and other things in updated configuration notes.

The only time I recall encountering 'out of shared memory' errors was because of an operating system configuration issue. For example, using a docker image, you typically need something like --shm-size=256MB.

agitator commented 3 years ago

Thx for the explanation :-) So the type of locking is rather important and the upcoming improvements should give a speedup with zodbconvert?

jamadden commented 3 years ago

In my experience, zodbconvert is very fast. We recently converted large production databases from MySQL to PostgreSQL using zodbconvert and it actually went much faster than expected. And that was on a largely untuned target server. If that's not what you're experiencing, I don't know what to tell you, except to look at the general deployment conditions of your server and do the general RDBMS troubleshooting/tuning steps.

agitator commented 3 years ago

The server admin made some optimisations on the Postgres

max_locks_per_transaction = 512
max_connections = 100
work_mem = 8MB
shared_buffers = 4GB
wal_level = replica 
max_wal_senders = 0
wal_compression = on
max_wal_size = 2GB
synchronous_commit = off

And I rerun zodbconvert again

$ ./bin/zodbconvert --clear --debug restore-to-relstorage.cfg 
2021-06-10 17:43:31,439 [relstorage._util] DEBUG  Using value 0.31 from environ 'RS_PERF_LOG_TRACE_MIN'=0.31 (default=0.31)
2021-06-10 17:43:31,439 [relstorage._util] DEBUG  Using value 1.24 from environ 'RS_PERF_LOG_DEBUG_MIN'=1.24 (default=1.24)
2021-06-10 17:43:31,439 [relstorage._util] DEBUG  Using value 3.03 from environ 'RS_PERF_LOG_INFO_MIN'=3.03 (default=3.03)
2021-06-10 17:43:31,440 [relstorage._util] DEBUG  Using value 9.24 from environ 'RS_PERF_LOG_WARNING_MIN'=9.24 (default=9.24)
2021-06-10 17:43:31,440 [relstorage._util] DEBUG  Using value 20.1 from environ 'RS_PERF_LOG_ERROR_MIN'=20.1 (default=20.1)
2021-06-10 17:43:31,440 [relstorage._util] DEBUG  Using value WARN from environ 'RS_PERF_LOG_DETAILS_LEVEL'='WARN' (default='WARN')
2021-06-10 17:43:31,440 [relstorage._util] DEBUG  Using value on from environ 'RS_PERF_LOG_ENABLE'='on' (default='on')
2021-06-10 17:43:31,575 [relstorage._util] DEBUG  Using value None from environ 'RS_CACHE_POLL_TIMEOUT'=None (default=None)
2021-06-10 17:43:31,577 [relstorage._util] DEBUG  Using value None from environ 'RS_MAX_POOLED_STORE_CONNECTIONS'=None (default=None)
2021-06-10 17:43:31,604 [ZODB.BaseStorage] DEBUG  create storage ./vartransfer/var/filestorage/Data.fs
2021-06-10 17:43:31,619 [ZODB.blob] DEBUG  (6075) Blob directory `./vartransfer/var/blobstorage` has layout marker set. Selected `bushy` layout. 
2021-06-10 17:43:31,637 [relstorage._util] DEBUG  Using value 4096 from environ 'RS_PACK_CURSOR_ARRAYSIZE'=4096 (default=4096)
2021-06-10 17:43:31,638 [relstorage._util] DEBUG  Using value 4096 from environ 'RS_PACK_STORE_BATCH_SIZE'=4096 (default=4096)
2021-06-10 17:43:31,638 [relstorage._util] DEBUG  Using value 120 from environ 'RS_PACK_COMMIT_FREQUENCY'=120 (default=120)
2021-06-10 17:43:31,638 [relstorage._util] DEBUG  Using value 1024 from environ 'RS_PACK_DOWNLOAD_BATCH_SIZE'=1024 (default=1024)
2021-06-10 17:43:31,638 [relstorage._util] DEBUG  Using value 1000 from environ 'RS_PACK_HP_DELETE_BATCH_SIZE'=1000 (default=1000)
2021-06-10 17:43:31,666 [relstorage.adapters.drivers] DEBUG  Attempting to load driver named 'gevent psycopg2' from 'psycopg2' failed; if no driver was specified, or the driver was set to 'auto', there may be more drivers to attempt.
Traceback (most recent call last):
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/drivers.py", line 147, in __init__
    self.driver_module = mod = self.get_driver_module()
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/postgresql/drivers/psycopg2.py", line 271, in get_driver_module
    __import__('gevent')
ModuleNotFoundError: No module named 'gevent'
2021-06-10 17:43:31,702 [relstorage.adapters.drivers] DEBUG  Using driver <relstorage.adapters.postgresql.drivers.psycopg2.Psycopg2Driver object at 0x7f63448e9fd0> for requested name 'auto'
2021-06-10 17:43:31,702 [relstorage.adapters.schema] DEBUG  Reading stored procedures from /home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/postgresql/procs and /home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/postgresql/procs/hp
2021-06-10 17:43:31,785 [relstorage.adapters.postgresql.drivers] DEBUG  Message from the RDBMS: NOTICE:  relation "zoid_seq" already exists, skipping
2021-06-10 17:43:31,801 [zodbconvert] INFO   Storages opened successfully.
2021-06-10 17:43:31,801 [zodbconvert] INFO   Clearing old data...
2021-06-10 17:43:31,810 [relstorage.adapters.schema] DEBUG  Before zapping existing tables ({'pg_largeobject_metadata', 'pg_aggregate', 'pg_constraint', 'pg_cast', 'pg_attribute', 'sql_features', 'pg_auth_members', 'pg_db_role_setting', 'pg_extension', 'pack_state_tid', 'pg_amproc', 'pg_inherits', 'pg_shdescription', 'pack_object', 'pg_seclabel', 'pg_type', 'pg_attrdef', 'pg_partitioned_table', 'pg_statistic_ext_data', 'pg_subscription_rel', 'pg_foreign_data_wrapper', 'pg_namespace', 'pg_ts_template', 'sql_implementation_info', 'pg_conversion', 'pg_ts_config_map', 'blob_chunk', 'pg_default_acl', 'pg_statistic', 'pg_policy', 'pg_depend', 'pg_ts_parser', 'pg_sequence', 'pg_publication_rel', 'pg_statistic_ext', 'pg_largeobject', 'pg_foreign_server', 'pg_opfamily', 'pg_ts_config', 'pg_subscription', 'pg_am', 'transaction', 'pg_user_mapping', 'pg_proc', 'sql_sizing', 'pg_shseclabel', 'pg_operator', 'pg_publication', 'pg_description', 'pg_collation', 'sql_parts', 'pg_ts_dict', 'pg_class', 'object_refs_added', 'pg_authid', 'current_object', 'pg_amop', 'pg_shdepend', 'pg_range', 'pg_trigger', 'pg_language', 'pg_opclass', 'pg_tablespace', 'pg_event_trigger', 'pg_init_privs', 'commit_row_lock', 'pg_index', 'pg_database', 'pack_state', 'pg_replication_origin', 'pg_transform', 'pg_foreign_table', 'pg_rewrite', 'object_ref', 'pg_enum', 'object_state'}) with <cursor object at 0x7f6344977d60; closed: 0>; slow: False
Traceback (most recent call last):
  File "./bin/zodbconvert", line 336, in <module>
    sys.exit(relstorage.zodbconvert.main())
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/zodbconvert.py", line 161, in main
    destination.zap_all()
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/storage/__init__.py", line 350, in zap_all
    self._adapter.schema.zap_all(**kwargs)
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/schema.py", line 857, in zap_all
    self.connmanager.open_and_call(zap_all)
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/connmanager.py", line 292, in open_and_call
    res = callback(conn, cursor)
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/schema.py", line 832, in zap_all
    self._before_zap_all_tables(cursor, existent, slow)
  File "/home/elk/zhref/eggs/RelStorage-3.4.1-py3.8-linux-x86_64.egg/relstorage/adapters/postgresql/schema.py", line 360, in _before_zap_all_tables
    cursor.execute("""
psycopg2.errors.OutOfMemory: out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

Seems (at least to me) that there is rather a problem with --clear on a db with existing data than the actual Postgres config?

The machine after restart and clicking around on the Plone site a bit.

Tasks:  90 total,   5 running,  85 sleeping,   0 stopped,   0 zombie
%Cpu(s): 16.4 us,  0.0 sy,  0.0 ni, 83.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  16384.0 total,   5694.7 free,  10505.8 used,    183.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   5878.2 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                       

  380 postgres  20   0 4407108 133056 131168 S   0.0   0.8   0:00.13 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13+
  438 postgres  20   0 4407108   5800   3908 S   0.0   0.0   0:00.00 postgres: 13/main: checkpointer                               
  439 postgres  20   0 4407108  14388  12500 S   0.0   0.1   0:00.01 postgres: 13/main: background writer                          
  440 postgres  20   0 4407108   5728   3832 S   0.0   0.0   0:00.00 postgres: 13/main: walwriter                                  
  441 postgres  20   0 4407656   7916   5724 S   0.0   0.0   0:00.00 postgres: 13/main: autovacuum launcher                        
  442 postgres  20   0   68552   5604   3572 S   0.0   0.0   0:00.00 postgres: 13/main: stats collector                            
  444 postgres  20   0 4407536   6592   4500 S   0.0   0.0   0:00.00 postgres: 13/main: logical replication launcher               
 1508 postgres  20   0 4408488  24204  21100 S   0.0   0.1   0:00.01 postgres: 13/main: zodbuser zodb ::1(43964) idle              
 1509 postgres  20   0 4408488  24108  21000 S   0.0   0.1   0:00.01 postgres: 13/main: zodbuser zodb ::1(43966) idle              
 1514 postgres  20   0 4408488  24044  20936 S   0.0   0.1   0:00.01 postgres: 13/main: zodbuser zodb ::1(43972) idle

While clearing, SELECT goes up to about 26% of MEM before crashing

top - 18:25:25 up 20 min,  2 users,  load average: 2.11, 1.05, 1.00
Tasks: 106 total,   2 running, 104 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.3 us,  0.0 sy,  0.0 ni, 95.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  16384.0 total,   1258.1 free,  10851.5 used,   4274.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   5532.5 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                       

 2926 postgres  20   0 4469936   4.2g   4.1g R  84.4  26.0   0:40.06 postgres: 13/main: zodbuser zodb ::1(46470) SELECT

That's after the failed zodbconvert

top - 18:19:42 up 14 min,  2 users,  load average: 0.52, 1.24, 1.09
Tasks:  96 total,   1 running,  95 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.0 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  16384.0 total,   1374.1 free,  10735.7 used,   4274.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   5648.3 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                       

  380 postgres  20   0 4407108 133056 131168 S   0.0   0.8   0:00.17 /usr/lib/postgresql/13/bin/postgres -D /var/lib/postgresql/13+
  438 postgres  20   0 4407208   4.1g   4.1g S   0.0  25.4   0:46.93 postgres: 13/main: checkpointer                               
  439 postgres  20   0 4407108 295968 294040 S   0.0   1.8   0:00.69 postgres: 13/main: background writer                          
  440 postgres  20   0 4407108  22360  20448 S   0.0   0.1   0:09.81 postgres: 13/main: walwriter                                  
  441 postgres  20   0 4407656   7916   5724 S   0.0   0.0   0:00.01 postgres: 13/main: autovacuum launcher                        
  442 postgres  20   0   68552   5604   3572 S   0.0   0.0   0:00.14 postgres: 13/main: stats collector                            
  444 postgres  20   0 4407536   6592   4500 S   0.0   0.0   0:00.00 postgres: 13/main: logical replication launcher               
 1508 postgres  20   0 4408488  24204  21100 S   0.0   0.1   0:00.01 postgres: 13/main: zodbuser zodb ::1(43964) idle              
 1509 postgres  20   0 4409260  31480  28132 S   0.0   0.2   0:00.07 postgres: 13/main: zodbuser zodb ::1(43966) idle              
 1514 postgres  20   0 4409260  34848  31500 S   0.0   0.2   0:00.08 postgres: 13/main: zodbuser zodb ::1(43972) idle              
 1724 postgres  20   0 4409224  78736  74972 S   0.0   0.5   0:01.05 postgres: 13/main: zodbuser zodb ::1(44132) idle              
 1725 postgres  20   0 4409096  30988  27704 S   0.0   0.2   0:00.02 postgres: 13/main: zodbuser zodb ::1(44134) idle              
 1726 postgres  20   0 4409096  31020  27736 S   0.0   0.2   0:00.04 postgres: 13/main: zodbuser zodb ::1(44136) idle              
 1727 postgres  20   0 4408708  27844  24612 S   0.0   0.2   0:00.01 postgres: 13/main: zodbuser zodb ::1(44138) idle              
 1728 postgres  20   0 4409260  31496  28212 S   0.0   0.2   0:00.04 postgres: 13/main: zodbuser zodb ::1(44140) idle              
 1729 postgres  20   0 4409232  80280  76404 S   0.0   0.5   0:01.35 postgres: 13/main: zodbuser zodb ::1(44142) idle

I recreate an empty db for now and try again but does this give any clue what other settings could be optimised?

jamadden commented 3 years ago

Thanks, that's very helpful.

What's going on here is that we're clearing away the old blobs (one ZODB Blob object becomes one RDBMS BLOB). PostgreSQL has a unique way of storing blobs, where they are essentially separate from the actual database: there's not a row in a table you can access to read or delete a blob. Instead, it gives you an integer you use to refer to a chunk of data, and you use functions like lo_open(int), lo_write(int) and lo_read(int) to deal with the data. You have to store that integer to be able to refer to the data; if you lose the integer, you basically "leak" the data. When you're done with the data, you must call lo_unlink(int) to actually remove the data.

(By "separate from the actual database", I mean that the data is hidden away in internal PostgreSQL tables that aren't normally accessed directly.)

In normal circumstances, calling lo_unlink is done automatically with a trigger: when the ZODB blob is deleted, the row holding the PostgreSQL integer is deleted, the trigger calls lo_unlink, and all is right with the world.

But "zapping" the database (what clear does) doesn't delete row-by-row, because that's horribly slow. Instead, it uses TRUNCATE to very quickly clear the tables. TRUNCATE doesn't run triggers, though, so we have to manually call lo_unlink for each integer first.

Today I learned that lo_unlink doesn't use normal row-level-locking like you might expect. You guessed it, instead of row locks, it uses advisory locks. So for zapping a database with many existing blobs, max_locks_per_transaction is relevant, and for databases with lots of blobs (you must have more than 512,000 blobs) it may not even be possible without increasing it.

I'm looking into alternative ways to handle this. In the meantime, I have a couple options:

You could (temporarily, perhaps) increase max_locks_… further. Get the number of rows in blob_chunk (SELECT COUNT(*) FROM blob_chunk), divide that by max_connections, and you get the value you need for max_locks_per_transaction (add some more to be safe).
You could TRUNCATE TABLE blob_chunk and then run the vacuumlo command that comes with PostgreSQL to clean up the blobs:

$ psql -d relstoragetest_hf -h <thehost> -p 54322 -U relstoragetest -c 'TRUNCATE TABLE blob_chunk'
TRUNCATE TABLE
$ vacuumlo -n relstoragetest_hf -h <thehost> -p 54322 -U relstoragetest
Connected to database "relstoragetest_hf"
Test run: no large objects will be removed!
Checking chunk in public.blob_chunk
Would remove 340500 large objects from database "relstoragetest_hf".

agitator commented 3 years ago

68239/68242,100.00%, 11.43 MB/s  38.02 TX/s, 20510.18 MB 29.9 minutes

Generous 3000 max_locks_per_transaction did the trick :-)

zodb / relstorage

Postgresql config recommendations for Relstorage #468