Closed kentcdodds closed 1 year ago
@kentcdodds From the issue you have in #138 it looks like the node is in a retry loop. The .primary
file only exists while the replica is connected to the primary so if it's disconnecting and retrying then it won't be available.
Under normal operation this shouldn't happen. I'm on the fence about whether the node should continue showing the last connected primary even if it isn't actively connected.
At the very least, I need to update the docs to clarify this.
I added documentation for the special files (e.g. .primary
and -pos
files): https://fly.io/docs/litefs/files/
Great! Thanks!
Question, could you add a section at the end of the primary file that explains what to do when you're app can't find a .primary
file? What should my application code do in that case? Fail to startup to prevent the deploy from happening?
Also, for the -pos
files, does my application code need to do anything with that or is that just implementation information that you're documenting there?
Question, could you add a section at the end of the primary file that explains what to do when you're app can't find a .primary file? What should my application code do in that case? Fail to startup to prevent the deploy from happening?
Yes, I'll add documentation for that. Good idea. The tl;dr is that your app should forward to the host specified in the .primary
if it exists and it should attempt to write locally if it doesn't exist. If the local database is on the current primary then it will successfully write. If it just can't find the primary then your app will return a SQLITE_READONLY
error.
There is an open issue where the SQLITE_READONLY
only works with the rollback journal. I'm still trying to figure out how to get it to return that specific error in WAL mode.
Also, for the -pos files, does my application code need to do anything with that or is that just implementation information that you're documenting there?
The TXID in the position file is useful if you write to the primary and then need to wait for a replica to catch up before a client can read from it. For example, if your primary's TXID is 100
after a write then you can have a client on a replica poll the position file to wait until it reaches (or surpasses) a TXID of 100
.
I updated the docs for the pos & primary files with this pull request. Should be live on docs any minute. https://github.com/superfly/docs/pull/411
Great! And what should people do if we encounter a SQLITE_READONLY
error? I don't expect the app can do much, but if that happens it's really unexpected and I don't know what I would do to troubleshoot the issue.
Here's my litefs config: https://github.com/kentcdodds/kentcdodds.com/blob/dev/other/litefs.yml
So based on what I've read and the example, I would expect to find the
.primary
file at/litefs/data/.primary
. I only have two regions and I've SSH-ed into both of them and cannot find that file. I also checked at/data/.primary
and it wasn't there either.What have I got wrong?