plepe / pgsql-partitioning

Partitions a database table by a where clause, or an integer or geometry column
3 stars 0 forks source link

== INTRODUCTION == Using very large database tables in PostgreSQL is no big deal, but at a certain point performance on index queries starts to diminish. PostgreSQL offers a very promising feature, "[[http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html|Partitioning]]", which tries to solve this problem by splitting tables into sub-tables, so that queries only need to load one or a few of those.

There are two forms of partitioning:

In fact the Partitioning system is rather limited, as it works only for a few specific cases:

The modules in this repository solve the problem by using sub-tables without any (automatic) checks and using simple [[http://www.postgresql.org/docs/9.1/static/ddl-inherit.html|inheritance]] instead (which is the basis of partitioning). To overcome the limitiations described in the previous paragraph a stored procedure for querying is used, which selects the to-be-queried sub-tables by itself.

== EXAMPLE == A simple example: {{{ -- create a test-table for demonstration create table marker ( title text not null ); select AddGeometryColumn('marker', 'way', 900913, 'GEOMETRY', 2);

-- convert table into a partitioned table (magic happens) select partition_geometry_init_table('marker');

-- insert something into the table: insert into marker values ('My Home', ST_Point(5, 5)); -- a message "INSERT 0 0" will be written to stdout ... this is a known -- problem with partitioning. try it, you'll see, the content is there

-- Query all entries in a region select * from marker(ST_MakeEnvelope(-10, -10, 10, 10, 900913));

-- Add an additional where parameter select * from marker(ST_MakeEnvelope(-10, -10, 10, 10, 900913), $$title='My Home'$$);

-- You could also use: select * from marker where way && ST_MakeEnvelope(-10, -10, 10, 10, 900913) and title='My Home'; -- but this query will always include all sub-tables. Also you should add -- "distinct" as overlapping objects will be added to all matching -- sub-tables. The query-function will always remove duplicates. }}}

== MODULES IN THIS REPOSITORY == All modules define a function for setting up an already created table for partitioning and will create the query function.

Query function: function 'xyz(key column, where, options)'

=== Partition Geometry === This module works for geometries as defined by [[http://postgis.refractions.net/|PostGIS]]. In cases where entries overlap sub-tables, entries will be duplicated and stored in all sub-tables. Note that the query-function will only do a bounding box (&&) check (and may return entries outside the given geometry).

=== Partition Integer === A range check might have its draw backs, e.g. if sub-tables with low ids will over time get smaller (because entries are being deleted but never re-created in that range). PostgreSQL will not free up this harddisk space if you don't do a manual VACUUM FULL (which will lock the table exclusively - something you want to avoid on a live database).

This module will use a part of the integer key, e.g. the 9th to 16th bit for partitioning. You could also use the 8 (or so) least significant bits, but in many cases you would want to query a list of ascending values and you would need to query a sub-table per value - not very reasonable.

=== Partition Where === The partitioning is done by a (or several) where clauses, e.g. all motorways in sub-table A, all railways in sub-table B, all other objects in sub-table C. Entries which match several where clauses will be entered to all matching sub-tables. C will contain all entries not matching any of those where clauses.

You have to assign an id for each where clause, and need to supply this id (or list of ids) as first query function parameter.

=== Partition Geometry Advanced === This module is based on an earlier version of Partition Geometry and features automatic splitting of boundaries. Actually, the Partition Geometry module is a simplyfied version of this module.

== READ MORE ==