mreithub / pg_recall

Simple PostgreSQL extension that keeps track of changes to certain tables
Other
53 stars 1 forks source link

Think about partitioning the log table #15

Open mreithub opened 8 years ago

mreithub commented 8 years ago

For large log tables it might make sense to transparently partition them (speeds up lookups and cleanup)

The recall_enable() function could then then get an additional parameter specifying the number of partitions you want (for the given logInterval). If the partition count is set to 1 (default), partitioning won't be enabled for that log table.

The trigger function has to dynamically create new partitions when needed (and could at the same time easily drop old ones, making manual cleanup() calls practically obsolete)

Special care has to be taken to decide on good partition start times as well as a good naming scheme for each of the partitioned tables.

mreithub commented 8 years ago

This feature though is something for the future (>1.0).

Also it might be a good idea (for performance reasons) to have separate trigger functions (one with partitioning, one without - maybe it's possible to have the partitioning one use (call) the simple one to avoid duplicate code)

mreithub commented 8 years ago

I noticed that there actually are two ways to partition the log data (since we store two timestamps for each entry):

Also: PostgreSQL honours check constraints on partitions to quickly discard the ones that can't possibly contain any data matching the query. If we're partitioning by end time, queries selecting by start time (which I guess will come up more often than those for the end timestamp) might not be able to take advantage of that.

To sum up, there's no one-size-fits-all solution for partitioning our log data. When using a naive version of start-time partitioning, we might lose old but still-existing log entries (but at least the start-time never changes). With end-time partitioning, there's no way to avoid moving entries between partitions when they change.