pgpartman / pg_partman

Partition management extension for PostgreSQL
Other
2.09k stars 281 forks source link

Is it possible to apply Unique constraints in partitions? #284

Closed hedgss closed 5 years ago

hedgss commented 5 years ago

Hi,

I'd like to have a guarantee that each string id within one partition is unique. It can be none unique within all partitions.

I am using pg_partman version 4.2.2 in the native mode with PostgreSQL 11. My partitioning key is a timestamp. One column is actual id. It is a some custom string.

I cannot use primary key or Unique globally with native partitioning. Also, I don't want to add a trigger on each Insert. It will affect performance eventually. Also, I know that I can create a unique constraint + . It

But I'd like to have a guarantee that each id within one partition is unique. I can add a unique constraint to each partition manually. Is it possible to apply the Unique constraint on partitions automatically using pg_partman? It'd solve the main part of the issues.

P.s. I will have additional checks and monitoring to check that id is unique within all partitions. I am going to run the checks in the standby server. If we have some duplicate people can make a decision.

keithf4 commented 5 years ago

It is currently impossible to enforce a unique constraint across all child tables in a partition set unless the partition key is part of the constraint. For time-based partitioning, this is almost never the case, so right now it's generally not possible.

Since you cannot have native partitioning manage unique constraints unless they include the partition key, pg_partman includes a method where you can use a template table to be able to apply a unique constraint to all child tables. This is the best that currently can be done at this time.

pg_partman also comes with a script to do that check to ensure that a constraint is not being violated across the partition set. See the check_unique_constraint.py script. I believe that should be able to run against the standby as you planned as well. Again, this can only check after the fact and does not provide enforcement to prevent the constraint violation across child tables.

hedgss commented 5 years ago

@keithf4 You helped me again. Everything works as expected. And the check_unique_constraint.py is very useful. But I think I will need to implement own solution because eventually, we will have tens of billions records. But at this moment it is ok. Thanks a lot!

keithf4 commented 5 years ago

Glad it worked!

At some point, may want to re-evaluate whether you actually need to keep that much data fully available in a single partition set forever. The retention system has the ability to simply uninherit tables from the partition set without dropping them. Can even move them to a different schema for organization. Then that lowers the burden of having to validate an always growing amount of data forever, but still leaves the data readily available via directly querying the old children if needed. It also simply improves query performance on the partition set in general since there's fewer tables to manage in plans.

Or you could even just dump it out if it doesn't actually have an immediate need. It's available if it needs to be loaded back in, but stored in an archive with a much lower footprint (compressed with no indexes). There's another python script that can dump out all the tables in a given schema, provides a SHA hash and can then also drop the table once it successfully dumps too.

hedgss commented 5 years ago

@keithf4 I also considered different options:

But I forgot that it is possible to detach a partition and attach it to another place. Thank you!