kiwicopple / serverless-postgres

308 stars 7 forks source link

Initial notes and problems to solve #2

Open ErikPetersenDev opened 2 months ago

ErikPetersenDev commented 2 months ago

I've spent a few hours on this and took notes on some things I think we'll want to work through. We can split these out into individual Issues if needed but wanted to get the notes documented in one place.

Issues/To Do

1. Data loss on Fly machine restart

When deployed on Fly, data is lost when the machine is stopped (including auto-stop) and later restarted. This should be a relatively easy fix using Fly Volumes. The root filesystem in a Fly machine is lost on restart, so we have to define a persistent Volume in our fly.toml. The Fly Postgres repo has an example of this (see 'mounts' section): https://github.com/fly-apps/postgres-flex/blob/master/fly.toml (maybe Dockerfile changes too). Without a Volume, every restart would likely require an undesirable full reinitialization with the S3 bucket (discussed in the "load data..." item below - requires running the OrioleDB python script).

2. Document recommended way to connect externally

Fly has a guide for external connections to PG. I have not been able to get a direct DB connection (i.e., via connection string) from my local machine yet. Using fly proxy 9432:5432 -a fly-app-name and then connecting to localhost:9432 works fine, but we'll likely want to figure out and document the right way to do this. Note that their documentation relies on handlers = ["pg_tls"] which relates to a separate thing documented below (SSL). I've seen other recommendations (e.g., handlers = []) for insecure connections but wasn't able to get it working in the short time I looked into it.

3. Auto-stop/start

Auto-stop/start is enabled and correctly stops the Machine after a few minutes. When connected from my local environment (via fly proxy as discussed above), trying to reopen the connection or stop/restart the fly proxy connection from the terminal fails and doesn't initiate a auto-restart. I have not looked at this in detail, but it seems that Fly Proxy is involved in auto-restart and I'm not sure if the way I'm currently running my connection through the local fly proxy impacts auto-restart. Docs for auto-start/stop and Fly Proxy. One thought is that since my local fly proxy connection is treated as an internal connection it's not going through the correct "door" (Fly Proxy) needed to trigger a new connection/restart. Therefore, auto-restart may be related to the item above (getting the external connection working/documented).

4. SSL

We'll eventually need an SSL connection if we're planning to expose the DB port directly. Fly Postgres appears to have this implemented. This may not be critical for initial non-prod testing though.

5. Load data into new instance

OrioleDB has a Python utility that appears to be required if you're initializing a new PG instance from an existing S3 bucket. I believe this will be required if we delete our Fly app and want to spin up a new one connected to the existing bucket. This also might be needed later for read-replicas, although I haven't looked into > 1 instance yet.

Docs: https://www.orioledb.com/docs/usage/decoupled-storage#s3-loader-utility File: https://github.com/orioledb/orioledb/blob/main/orioledb_s3_loader.py

6. Be aware of limitations of OrioleDB; consider copying/quoting them into docs here

OrioleDB currently has some limitations such as "one S3 bucket per PG instance" that we'll want to keep in mind once we get beyond single instance testing. It may be worth pulling a reference to those into the docs here to ensure they are top of mind. These should be things that are solved for eventually by the OrioleDB team ("All of the limitations above are temporary and will be removed in further releases."). Discussed at the bottom of this section.

Note: Use Fly Postgres as a resource

Several of the things documented above are general "Postges on Fly.io" problems, not directly related to this serverless-postgres project. It would be worth taking some time to investigate the implementation of Fly Postgres (repo here: https://github.com/fly-apps/postgres-flex, docs here: https://fly.io/docs/postgres/). For example, the data persistence and SSL "issues" documented above appear to be solved in that repo already. We may be able to use some of that work (updated with the orioledb images, etc), to get the core of this working faster. However, note that their setup is geared towards multiple machines always running in production, which is a different target than the serverless goal here so we'll have to be mindful about what we bring over (although things like multiple instances per region may be desirable even for serverless!).

kiwicopple commented 2 months ago

this is one of the more impressive Issues I've seen on GitHub @ErikPetersenDev - it's exactly the roadmap that I would create myself

You've nailed every step (some of which are solved by @pragmaticanon now it seems)

We can split these out into individual Issues if needed

It appears that the remaining issues are 2 -> 6: is that correct? if so, I can split it up (or you're welcome to split this up yourself). It will help with any detailed discussions

Also, feel free to reach out to me if you're interested in helping with these and will consider sponsorship (I'm on twitter or LinkedIn)

kiwicopple commented 2 months ago

Load data into new instance

I need to read up a bit more, but i think that this is solved natively by Fly.io now, using Tigris-backed volumes:

https://community.fly.io/t/tigris-backed-volumes/20792

I'll ask the tigris team for more details


Edit: Apparently this isn't how it works

It just means that you can use the same bucket for multiple volumes, but they all store data in different paths this tool is about extending a single volume, not sharing data Something like that would be https://www.tigrisdata.com/blog/fly-tigris-juicefs

ErikPetersenDev commented 1 month ago

I've opened a few new issues to break this one up:

ErikPetersenDev commented 1 month ago

Also, the note at the end of this issue about looking at postgres-flex for inspiration is now covered by: https://github.com/kiwicopple/serverless-postgres/issues/6