tembo-io / pg_tier

Postgres Extension written in Rust, to enable data tiering to AWS S3
Apache License 2.0
137 stars 4 forks source link

Operator guide on tiering data, caveats, etc. #19

Open andyatkinson opened 4 months ago

andyatkinson commented 4 months ago

If the people table in the examples was connected to an app, and was receiving new row changes, does the extension fully cover transitioning it to a foreign table, including table name swap within a transaction for an online operation? Or would the intention be for that to be an offline operation?

Once the table has been converted to a foreign table, can we TRUNCATE the rows from the original table?

To improve query performance, I don't think we can add indexes to a foreign table. We could add a materialized view based on the table content. I was wondering what the space consumption was like between a foreign table with a materialized view, compared with a traditional table. In theory they'd be about the same, which would lessen the space reduction benefit of tiering the data. Curious on thoughts about that.

shhnwz commented 4 months ago

If the people table in the examples was connected to an app, and was receiving new row changes, does the extension fully cover transitioning it to a foreign table, including table name swap within a transaction for an online operation? Or would the intention be for that to be an offline operation?

Transaction is not supported on Tiered table. From the use case perspective we can think of an aged table partition where no new rows are expected, if this is the case then this extension fully covers the transitioning a regular table into a foreign table and truncation of old regular table.

Once the table has been converted to a foreign table, can we call TRUNCATE the rows from the original table?

Yes, now TRUNCATE happens automatically after data moved to S3.

To improve query performance, I don't think we can add indexes to a foreign table. We could add a materialized view based on the table content. I was wondering what the space consumption was like between a foreign table with a materialized view, compared with a traditional table. In theory they'd be about the same, which would lessen the space reduction benefit of tiering the data. Curious on thoughts about that.

Good question!! Object Store Tier always has trade off between reducing high cost per bit claimed by aged data vs query SLA. They both are inversely proportional. Lesser SLA (Less response time) will lead to High cost per bit vice-versa. Creating Materialized View can an option to improve performance of adhoc strategic queries having strictly lesser SLA, then drop the view to reclaim the space.

andyatkinson commented 4 months ago

@shhnwz Thanks for the responses. The tiering + ad hoc or custom range materialized view strategy is interesting to me. There could even be a combination of strategies with file-based/foreign table data with slower access speed, but a big range of data, then a materialized view scoped to a portion of it where faster access speed is preferred, at a trade-off as you said of higher cost per data bit, more space consumed, etc.

shhnwz commented 4 months ago

Thanks @andyatkinson I concur 👍