Open ololobus opened 3 months ago
Sounds like we want a default of 32MB?
@arssher last planning we discussed adding the size-based equivalent for neon.logical_replication_max_snap_files
because the cumulative size is what we also care for AUX files. I wonder, was there any specific reasoning for adding a files-count-based limit?
If not, I think we could have both, like 10k count limit and 10 MB or so for size (check the right numbers with storage)
cc @tristan957
I wonder, was there any specific reasoning for adding a files-count-based limit?
When we were adding this, the count was the main problem / bottleneck due to O(n^2) operations in the pageserver. Since this is apparently solved maybe size limit also reasonable. But it is probably not very different because snap files contain list of running xacts, which would usually be in range 0-100, maybe 1000. Hm, though not sure if subxacts are separately tracked there, probably they are.
Size of LR snapshot files doesn't;t depend on number of subxids (at least as far as I understand). looking at the code it should depend on mummer of committed transactions:
/* copy committed xacts */
if (builder->committed.xcnt > 0)
{
sz = sizeof(TransactionId) * builder->committed.xcnt;
memcpy(ondisk_c, builder->committed.xip, sz);
COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
ondisk_c += sz;
}
... but looks like I do not completely understand which committed transactions are taken in account. I tried to get large snapshot files with the following test:
import time
from fixtures.neon_fixtures import (
NeonEnv,
logical_replication_sync,
)
from fixtures.log_helper import log
def test_snap_files(neon_simple_env: NeonEnv, vanilla_pg):
env = neon_simple_env
tenant_id = env.initial_tenant
timeline_id = env.initial_timeline
endpoint = env.endpoints.create_start("main", config_lines=["log_statement=all"])
con1 = endpoint.connect()
cur1 = con1.cursor()
con2 = endpoint.connect()
cur2 = con2.cursor()
cur1.execute("create table t(pk serial primary key, payload integer)")
cur1.execute("create publication pub1 for table t")
# now start subscriber
vanilla_pg.start()
vanilla_pg.safe_psql("create table t(pk integer primary key, payload integer)")
connstr = endpoint.connstr().replace("'", "''")
log.info(f"ep connstr is {endpoint.connstr()}, subscriber connstr {vanilla_pg.connstr()}")
vanilla_pg.safe_psql(f"create subscription sub1 connection '{connstr}' publication pub1")
# Wait logical replication channel to be established
logical_replication_sync(vanilla_pg, endpoint)
# insert some data
cur1.execute("begin")
cur1.execute("insert into t (payload) values (0)")
for _ in range(10000):
cur2.execute("insert into t (payload) values (0)")
log.info("Waiting 1 minute")
time.sleep(60)
cur1.execute("commit")
logical_replication_sync(vanilla_pg, endpoint)
assert vanilla_pg.safe_psql_scalar("SELECT count(*) FROM t") == 10001
but still size of snapshot file is 144 bytes. May be somebody has idea how to get larger snapshot files?
In any case it seems to be good idea to calculate total size of snapshot files rather than just their number.
Also there was some concerns that larger transaction can cause generation of large snapshot files. From the code it is clean that it is not true. But in any case I tried LR with larger transactions. Two observation:
streaming=parallel
option. There is now discussion in hackers whether parallel option should be switched on by default:
https://www.postgresql.org/message-id/flat/TYAPR01MB5692F83F3C7A383FD13A209CF57D2%40TYAPR01MB5692.jpnprd01.prod.outlook.com#c3c2c14d14[…]71f730f9b20 regarding the size limit, pageserver currently issues a single vectored read for aux files when generating the basebackup, and we will print a warning if the size is >= 128KB. While a majority of tenants fall below this limit, pageserver can also handle aux file basebackups ~10MB.
pageserver_aux_file_estimated_size
, so it's enough for now)Currently, when we have >1000 snap files, we delete the slot. We can probably remove (bump) logic for replication slot removal https://github.com/neondatabase/neon/blob/42229aacf60831443d9ec5e2342db34a143f9f1d/pgxn/neon/neon.c#L70
John: That'd be great for all AUX files to occupy 32 MB (???) Important comment from Heikki on how to generate big aux files: https://github.com/neondatabase/neon/issues/6626#issuecomment-2065475447