[Idea] Database schema selection

With the 0.6 release, we removed the "publish schema" concept in an attempt to reduce complexity. Unfortunately, a number of users relied on the publish schema for zero downtime deployments with 1) direct SQL and 2) ponder serve. There are workarounds, but they are complex and manual - definitely a regression.

We'd like to unblock these users as quickly as possible.

TL;DR

Here's the new proposal.

Rather than use a different schema for each instance, all instances share one schema and use a table name prefix to avoid collisions, e.g. public.73bd1fa_transfer_events
When a deployment becomes healthy, it creates or updates views in the same schema using the bare name {schema}.{tableName}, e.g. public.transfer_events
There is an automated mechanism for cleaning up tables created by stale instances
All instances still share the same cached RPC data, located in the ponder_sync schema

Detailed proposal

Requirements

Ponder's "get started" experience for new users should not require any database administration
Ponder should work out of the box with modern container orchestration patterns (e.g. Railway, Kubernetes)
Advanced use cases should be straightforward for experienced operators (horizontal scaling via ponder serve, query Ponder tables directly from other applications, etc)
Ponder should not clutter the target database with stale objects
Ponder should remain "stateless" such that updates to ponder.schema.ts & indexing functions should not require a database migration

Instance behavior

With those in mind, here is yet another proposal for how Ponder could work at the database layer.

Like today, the user can select a database schema in ponder.config.ts which defaults to public
When an instance starts up, it generates a random "instance ID" and creates tables in the target schema using a {schema}.{instanceId}_{tableName} naming pattern, e.g. public.73bd1fa_transfer_events
After the instance has finished indexing and becomes healthy, it creates or updates views in the same schema using the bare name {schema}.{tableName}, e.g. public.transfer_events

The implementation is more complex - there lots of edge cases to consider around locking, table name collisions, caching, and the reorg log tables (though we've solved most of this already).

Pros

You can simply hardcode a database schema in ponder.config.ts and redeploy freely - no need to manually select a new schema for every instance
Consumers (via direct SQL, ponder serve, other applications using e.g. Drizzle) can query the bare table names and get the latest data without worrying about which specific instance they are querying
All data for a specific Ponder project is is one schema, which reduces database clutter

Cons

We're using views as a "table alias" which is not what they are designed for. Ideally, the live/latest data would be a table. There's still a bit of magic here that I don't like.

Alternatives

Rename tables instead of using views

With this design, when an instance goes live it would rename its tables from the instance-specific version to the bare version. During a redeployment, the new live instance would need to forcefully rename the current live tables back to their instance-specific version, then rename its own tables.

The key benefit of this approach is that the latest data is an actual table, not a view. However, the implementation would be much more complex, bug-prone in crash scenarios, and I think more difficult to reason about.

ponder-sh / ponder