Closed problame closed 1 month ago
Don't think it pug per say, but will take a look.
One more occurence with a different oversize
2024-09-09T10:38:00.505829Z WARN initial_size_calculation{tenant_id=$TENANT_ID shard_id=0000 timeline_id=$TIMELINE_ID}:logical_size_calculation_task: Oversized vectored read (165376 > 131072) for keys 000000067F00004002000000000000000001@32/84139CE0
We print this when trying to read a key value that's larger than 128KiB. Wondering why those values are that big, but we can bump this soft limit up to say 256KiB if we are okay with that. Alternatively, we can downgrade the log line.
000000067F00004002000000000000000001
$ cargo run -qp pagectl key 000000067F00004002000000000000000001
parsed from hex: 000000067F00004002000000000000000001:
Key { field1: 0, field2: 1663, field3: 16386, field4: 0, field5: 0, field6: 1 }
rel_block: false
rel_vm_block: false
rel_fsm_block: false
slru_block: false
inherited: true
rel_size: false
slru_segment_size: false
recognized kind: Some(RelDir(1663/16386))
So, maybe there is a lot of relations or a large relation?
Only eu-central-1
is warning about this consistently.
There's three problematic tenant/timelines:
000000067F00006000000000000000000001
is a reldir and 01000000000000000100000006000000001F
is an SLRU block. Pageserver doesn't have control for either of these.
Action items:
From chat with Heikki: it's okay for slru deltas to be slightly above 128KiB due to overhead in the storage format. Let's bump the threshold to 130KiB and allow list unbounded key types (reldir & dbdir).
Sample