pgpartman / pg_partman

Partition management extension for PostgreSQL
Other
2.09k stars 281 forks source link

Migration Scenario Question #330

Closed dtseiler closed 3 years ago

dtseiler commented 4 years ago

Offshooting this from Issue #325

Scenario is I have a table that we want to partition. The table has over 1.4 BILLION-with-a-B rows. Typically queries only go back 6 months but for whatever business reasons we have to keep all of it (going back almost 10 years) in the DB (we're working that out in a different effort).

My original thought process was that we need to partition it but make the old data available right away, so I looked at either trigger/inheritance based or native based and swapping the old table into the default partition. Based on your mention of potential write performance impact, we're shying away from the trigger method. Native based with default is a problem if we have to migrate data based on partition interval batches, which would be 1 month which would be a lot.

I broke my head out of that mold and think if I just rename the old table and create a new partitioned table and then migrated data into it, that might work for us. That allows me to use custom (eg 1 day) batch intervals to move the data. The app teams says they'd need the last 6 months of data ASAP, so I'd call partition_data_proc with p_source_table := 'renamed_orig_table_name', p_interval := '1 day' and p_batch := 180 (for 180 days or roughly 6 months) and p_order := 'DESC' to have start with the most recent data.

Does this make sense or am I misunderstanding something. I'm going to be setting up a test case later today but wanted to see if I'm on the right track.

dtseiler commented 4 years ago

Follow-up question: Does it pay to pre-create the partitions I know I'll need (eg set p_start_partition back to May 2011)? Or should I just let partition_data_proc() handle that? Does pre-creating them save any time/hassle for partition_data_proc()?

keithf4 commented 4 years ago

What is the partition interval you're going for? If you're going with native partitioning, remember you can't have the batch size be any smaller than the partition size, so you'll have to partition by day. Going back that many years means a lot of child tables. If you're on PG12 or 13 I think you'll be fine. Before that saw some performance issues once child count got up around several thousand I think (don't quote me on that).

The main reason for the precreation is to avoid the contention of trying to create the child tables on demand as they're needed by the incoming data. Whether it's of any further benefit I think will depend on how much you're actually trying to use the data that's being moved while it's being moved around. There are some heavier locks taken when the child tables are created, so if they're all made beforehand, you avoid those locks while partitioning the data.

You say you need to keep the data available in the database, but do you actually need it available IN the actual partition table at all times? You could certainly keep older "archived" data in a separate table and just have your application be aware that if a certain column's date is older than say, 6 months, it should also check the archive table as well (data may be in transit in batches, so might have to check both locations if it's really close to the cutoff time).

So you could quickly try and get the most recent 6 months partitioned out daily into the currently "active" table. Then over a longer period of time move that old data to the "archive" table. Could have that archive table be partitioned by month or even year.

Just some ideas.

dtseiler commented 4 years ago

This is on PG 12.4.

My understanding was that I could have a custom batch size if I was processing data from another table (not from the default partition). My tests seem to confirm that as well. I have a table partitioned monthly and I'm calling partition_data_proc() like this:

CALL partman.partition_data_proc(
    p_parent_table := 'public.foo',
    p_source_table := 'public.foo_old',
    p_interval := '1 day',
    p_batch := 180,
    p_order := 'DESC'
);

I agree that daily partitions going that far back would not be great.

So I ran that call as I just posted and it took almost 5 hours in my quiet prod clone. The partitioned table has 3 non-unique indexes on it. I wonder if it's worth testing this import with no indexes on the old partitions. I would then add them afterward.

Question there is how best to add indexes to existing partitions. Do I have to make a clever script to do them all individually or is there something that can be done from the parent table or a pg_partman function?

keithf4 commented 4 years ago

Ahh yeah you can do the custom size if it's coming from a table that is not the default.

If it's a non-unique index, adding it to the parent should add it to all the children automatically. Not sure if that can be done concurrently or not. If it's unique and does not contain the partition key, that will be something you have to manually do. I tried adapting the old reapply_indexes.py script to work with native index inheritance, but it was just too messy trying to also account for what native does for you automatically.

dtseiler commented 4 years ago

Alright I'll play around with it and see. This table is only using non-unique indexes so no worries there.

If there is a concurrency problem, could I add indexes to the partition tables manually? Would adding the index to the parent then try to add new indexes on top of those? Or recognize that they already exist?

All partitions prior to the current month are always read-only (it's all insert-only data).

keithf4 commented 4 years ago

I believe adding an index to the parent automatically tries to add it to all children. If it already exists on the children, I'm not sure what it does

dtseiler commented 4 years ago

Alright I'll play around with this and report back.

dtseiler commented 4 years ago

The docs do say this:

Concurrent builds for indexes on partitioned tables are currently not supported. However, you may concurrently build the index on each partition individually and then finally create the partitioned index non-concurrently in order to reduce the time where writes to the partitioned table will be locked out. In this case, building the partitioned index is a metadata only operation.

I wonder if you have to use a specific naming convention for it to recognize this. I'll test later.

dtseiler commented 4 years ago

I noticed that the indexes that are created on partitions don't necessarily follow the name given on the parent. They rename them to use the columns involved. Is that something pg_partman is doing, or is that postgres?

e.g. I created this table on my parent partitioned table:

    "user_aggregations_foo_idx" btree (device_id, date, user_id)

but on the partitions, it doesn't use the foo naming at all:

    "user_aggregations_p2020_07_device_id_date_user_id_idx" btree (device_id, date, user_id)

(quick update): Looks like the naming truncates after a certain point, seems to be 32 characters:

    "user_aggregations_p2020_09_primary_international_id_date_us_idx" btree (primary_international_id, date, user_id)
dtseiler commented 4 years ago

Test so far seem good. If I pre-create the indexes on partitions, it doesn't seem to care what the name is when I create the index on the parent table. It looks like it recognizes that an index is there and it doesn't need to do anything. I'll try to make a write-up for this later with some example code.

keithf4 commented 4 years ago

I doubt it's any naming convention to see if an index already exists. You can look up in the catalogs which columns an index is on so that's all it's probably doing.

PostgreSQL has a 64 byte (not character) limit on object names and it will truncate anything after that. pg_partman does manage that for the child table names to ensure a suffix is always the last thing, but it doesn't care about index names.

pg_partman is doing nothing at all with indexes with native partitioning unless you're using the template system.

dtseiler commented 3 years ago

So does pg_partman basically take the name, truncate to 60 chars and then append _idx? Just curious although it doesn't seem like I have to use that same column-name-based naming convention. Running a couple more tests this morning.

dtseiler commented 3 years ago

Looks like you're right.

  1. Create partitioned table and create_parent() to pre-create partitions. No indexes yet.
  2. After data migration, create indexes concurrently on individual partitions with the foo name, eg user_aggregations_p2019_11_foo_idx
  3. After that is done, create index on parent table with bar name. No duplicate effort is made.
test=# \d user_aggregations
                     Partitioned table "public.user_aggregations"
          Column          |           Type           | Collation | Nullable | Default
--------------------------+--------------------------+-----------+----------+---------
 user_id                  | character varying(36)    |           | not null |
 primary_international_id | character varying(36)    |           | not null |
 device_id                | character varying(36)    |           | not null |
 date                     | timestamp with time zone |           | not null |
Partition key: RANGE (date)
Indexes:
    "user_aggregations_bar_idx" btree (primary_international_id, date, user_id)
    "user_aggregations_date_idx" btree (date DESC)
Number of partitions: 34 (Use \d+ to list them.)

test=# \d user_aggregations_p2019_11
                      Table "public.user_aggregations_p2019_11"
          Column          |           Type           | Collation | Nullable | Default
--------------------------+--------------------------+-----------+----------+---------
 user_id                  | character varying(36)    |           | not null |
 primary_international_id | character varying(36)    |           | not null |
 device_id                | character varying(36)    |           | not null |
 date                     | timestamp with time zone |           | not null |
Partition of: user_aggregations FOR VALUES FROM ('2019-11-01 00:00:00+00') TO ('2019-12-01 00:00:00+00')
Indexes:
    "user_aggregations_p2019_11_date_idx" btree (date DESC)
    "user_aggregations_p2019_11_foo_idx" btree (primary_international_id, date, user_id)
keithf4 commented 3 years ago

Created a howto doc for converting a normal table to a natively partitioned table

https://github.com/pgpartman/pg_partman/blob/master/doc/pg_partman_howto_native.md