Open erikgrinaker opened 3 weeks ago
I've run some initial tests using:
pg_restore -d tpch --clean --if-exists --no-owner --verbose --table lineitem --section pre-data --section data tpch.pg_dump
This imports an 11 GB lineitem
table, single-threaded. With fsync and S3 disabled, the Neon stack is marginally faster than vanilla Postgres using similar configurations:
That's kind of suspect in and of itself, but given that the disk can handle 5.5 TB/s and Safekeeper+Pageserver can handle 400 MB/s, Postgres appears to be the bottleneck here -- so we need to up the concurrency.
pg_restore
is basically just COPY
. We can probably write a synthetic benchmark with similar behavior by generating input data for COPY FROM STDIN
, combined with FREEZE
and disabling constraints. This gives us better control over the ingestion data and concurrency.
I'm going to put pg_restore
investigation on hold for now, as I'm able to ingest data much faster using a plain INSERT ... generate_series()
. See #9789.
We should add an end-to-end benchmark that measures
pg_restore
performance of a non-trivial dataset (e.g. 10 GB with 64 B rows). We should determine the bottlenecks and attempt to improve throughput.We should also try concurrent restores (e.g. 4 or 8 tables).
There is an existing pg_restore benchmark as a GitHub action in _benchmarking_preparation.yml, perhaps we can use that as a starting point.