superfly / litefs

FUSE-based file system for replicating SQLite databases across a cluster of machines
Apache License 2.0
3.78k stars 89 forks source link

Cannot find .primary file #139

Closed kentcdodds closed 1 year ago

kentcdodds commented 1 year ago

Here's my litefs config: https://github.com/kentcdodds/kentcdodds.com/blob/dev/other/litefs.yml

So based on what I've read and the example, I would expect to find the .primary file at /litefs/data/.primary. I only have two regions and I've SSH-ed into both of them and cannot find that file. I also checked at /data/.primary and it wasn't there either.

What have I got wrong?

benbjohnson commented 1 year ago

@kentcdodds From the issue you have in #138 it looks like the node is in a retry loop. The .primary file only exists while the replica is connected to the primary so if it's disconnecting and retrying then it won't be available.

Under normal operation this shouldn't happen. I'm on the fence about whether the node should continue showing the last connected primary even if it isn't actively connected.

At the very least, I need to update the docs to clarify this.

benbjohnson commented 1 year ago

I added documentation for the special files (e.g. .primary and -pos files): https://fly.io/docs/litefs/files/

kentcdodds commented 1 year ago

Great! Thanks!

Question, could you add a section at the end of the primary file that explains what to do when you're app can't find a .primary file? What should my application code do in that case? Fail to startup to prevent the deploy from happening?

kentcdodds commented 1 year ago

Also, for the -pos files, does my application code need to do anything with that or is that just implementation information that you're documenting there?

benbjohnson commented 1 year ago

Question, could you add a section at the end of the primary file that explains what to do when you're app can't find a .primary file? What should my application code do in that case? Fail to startup to prevent the deploy from happening?

Yes, I'll add documentation for that. Good idea. The tl;dr is that your app should forward to the host specified in the .primary if it exists and it should attempt to write locally if it doesn't exist. If the local database is on the current primary then it will successfully write. If it just can't find the primary then your app will return a SQLITE_READONLY error.

There is an open issue where the SQLITE_READONLY only works with the rollback journal. I'm still trying to figure out how to get it to return that specific error in WAL mode.

Also, for the -pos files, does my application code need to do anything with that or is that just implementation information that you're documenting there?

The TXID in the position file is useful if you write to the primary and then need to wait for a replica to catch up before a client can read from it. For example, if your primary's TXID is 100 after a write then you can have a client on a replica poll the position file to wait until it reaches (or surpasses) a TXID of 100.

benbjohnson commented 1 year ago

I updated the docs for the pos & primary files with this pull request. Should be live on docs any minute. https://github.com/superfly/docs/pull/411

kentcdodds commented 1 year ago

Great! And what should people do if we encounter a SQLITE_READONLY error? I don't expect the app can do much, but if that happens it's really unexpected and I don't know what I would do to troubleshoot the issue.