sosedoff / pgweb

Cross-platform client for PostgreSQL databases
https://sosedoff.github.io/pgweb
MIT License
8.53k stars 724 forks source link

Support for partitioned tables #683

Open jimis opened 1 year ago

jimis commented 1 year ago

I have a huge table of 10G rows. It is split into 1000 partitions - almost half of those are empty to be ready for future data influx. Problems:

A few ideas to handle this:

Previously, I had the table unpartitioned and in general I was very happy with pgweb's performance and responsiveness with the billions-rows table (except some VIEWs hanging forever). After I partitioned the table, the overall performance of the database went up, so I expect pgweb to handle it even better, when support for partitions is implemented. Thanks!

sosedoff commented 1 year ago

What kind of partitioning strategy are you using in this case?

jimis commented 1 year ago

It's the new style "declarative" partitioning (EDIT: described here). More info:

# \d test_runs_raw
                                 Partitioned table "public.test_runs_raw"
      Column       |            Type             | Collation | Nullable |             Default              
-------------------+-----------------------------+-----------+----------+----------------------------------
 run_n             | bigint                      |           | not null | generated by default as identity
 test_executable_n | integer                     |           | not null | 
 test_function_n   | integer                     |           | not null | 
 test_datatag_n    | integer                     |           |          | 
 workitem_n        | integer                     |           | not null | 
 started_on        | timestamp without time zone |           |          | 
 duration_ms       | integer                     |           |          | 
 test_resulttype_n | smallint                    |           | not null | 
Partition key: RANGE (workitem_n)
Indexes:
    "test_runs_raw_partitioned_pkey" PRIMARY KEY, btree (workitem_n, run_n), tablespace "archival_tablespace_1"
Foreign-key constraints:
[...]
Number of partitions: 1000 (Use \d+ to list them.)
Tablespace: "tablespace1"
sosedoff commented 1 year ago

I dont have much experience with partitioning in Postgres, do you think you can submit a PR with a test setup? Something that could emulate your use case and save me time on R&D. And just FYI, this is the file pgweb uses to lookup tables/functions/etc https://github.com/sosedoff/pgweb/blob/master/pkg/statements/sql/objects.sql - so maybe we could support partitioning with minimal changes. LMK

jimis commented 1 year ago

By "test setup", would a script that prepares a partitioned table, be enough? Maybe that also populates it? Where should it be posted?

At the present time it's difficult to steal time for looking into the internals of postgres and pgweb, so a proper pull request changing the logic in objects.sql might take time on my part.

sosedoff commented 1 year ago

Yes, by test setup i mean a PR with instructions to create tables/partitions + bonus if they have data. You can reference a gist if you want as well, the medium does not matter much. Im pretty short on time these days too, so your best bet is to provide as much setup instructions as you can.

jimis commented 1 year ago

Create the virtual (parent) table:

CREATE TABLE test_runs (
    testrun_id    bigint  NOT NULL  GENERATED BY DEFAULT AS IDENTITY,
    testname_id   int     NOT NULL,
    exitcode      int,
    duration      int,
    PRIMARY KEY (testname_id, testrun_id)
) PARTITION BY RANGE (testname_id);

Create 1000 partitions to hold values of testname_id up to 20K:

CREATE EXTENSION plpython3u;

DO $$
start = 0
fin   = 20000
step  = 20
for n in range(start, fin, step):
    nmax = n + 20
    stmt = f'''
        CREATE TABLE test_runs__PART_max{nmax}
            PARTITION OF test_runs
            FOR VALUES FROM ({n}) TO ({nmax})
    '''
    plpy.info(stmt)
    plpy.execute(stmt)
$$ LANGUAGE plpython3u;

To populate, we insert 1M random values that will be automatically distributed into the partitions:

INSERT INTO  test_runs (testname_id, exitcode, duration)
    SELECT s % 20000, round(random()), round(random()*30)
        FROM generate_series(0,999999) AS s;
ashkulz commented 5 months ago

@sosedoff would you be willing to accept PRs which adds this support?

sosedoff commented 5 months ago

I would not be super opposed to it if the feature is done in a generic way, without affecting the majority of the users of this tool that rely on simplicity and portability (myself included, i really dont want to turn this into pgadmin clone). Another important aspect is if you're working on the feature for your own consumption and understand the use case well, since i mentioned that i don't deal with partitioning often and can't justify the need in the first place.

ashkulz commented 5 months ago

@sosedoff I'll keep this in mind. Right now, there are two different problems:

I think the first would be a very simple PR, second is more subjective and I'll make sure to post the approach here before starting development -- but it'll require me to understand the structure of the code a bit, so may take a few weeks.

sosedoff commented 5 months ago

Regarding point 1 -- fair enough, i think its reasonable to fix the UI so at least folks can view the tables. Not the best UX, but its better than non-functioning one.

Dont have much to say on point 2 -- feel free to experiment and have fun!