neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
15.26k stars 445 forks source link

Investigate`pg_restore` performance #9623

Open erikgrinaker opened 3 weeks ago

erikgrinaker commented 3 weeks ago

We should add an end-to-end benchmark that measures pg_restore performance of a non-trivial dataset (e.g. 10 GB with 64 B rows). We should determine the bottlenecks and attempt to improve throughput.

We should also try concurrent restores (e.g. 4 or 8 tables).

There is an existing pg_restore benchmark as a GitHub action in _benchmarking_preparation.yml, perhaps we can use that as a starting point.

erikgrinaker commented 6 days ago

I've run some initial tests using:

This imports an 11 GB lineitem table, single-threaded. With fsync and S3 disabled, the Neon stack is marginally faster than vanilla Postgres using similar configurations:

That's kind of suspect in and of itself, but given that the disk can handle 5.5 TB/s and Safekeeper+Pageserver can handle 400 MB/s, Postgres appears to be the bottleneck here -- so we need to up the concurrency.

pg_restore is basically just COPY. We can probably write a synthetic benchmark with similar behavior by generating input data for COPY FROM STDIN, combined with FREEZE and disabling constraints. This gives us better control over the ingestion data and concurrency.

erikgrinaker commented 6 days ago

I'm going to put pg_restore investigation on hold for now, as I'm able to ingest data much faster using a plain INSERT ... generate_series(). See #9789.