Closed ayuhito closed 1 year ago
I reset everything by running fly scale count 0
and then tried spinning up two new machines.
This time we successfully launched one machine. However, while launching the second machine, things started breaking:
For reference, when you see a log line such as Fetched metadata for inter
, assume the app is attempting to write to the DB. Another thing to note, we're spinning this up without using volumes, so these databases are initialised empty.
I also attempted redeploying with fly scale count 0
and starting again. This time I made sure no writes occurred until both VMs spun up and I'm still seeing the same fuse
errors. Scaling down to 1 machine seems to have no issues.
Edit: Eventually I scaled up again and now the build is stable. But there are definitely some rough edges to consider.
Thank you for the awesome new release! I moved over to the new version and really wanted to try the new exec feature for database migrations. Unfortunately, I seem to have run on an infinite loop of the following...
cannot find primary & ineligible to become primary, retrying: no primary
Thanks for the feedback! Since your lease.candidate
is set to ${FLY_REGION == PRIMARY_REGION}
, this will occur if one of your nodes is not in the PRIMARY_REGION
. It looks like the first set of logs were running in sjc
but the second set was able get a primary in lhr
.
Another thing to note, we're spinning this up without using volumes, so these databases are initialised empty.
I definitely don't recommend this approach. If all your nodes go down at the same time then you'll lose your data since it's all on ephemeral disks. We have plans for making ephemeral disks work in the future but it's not advisable right now.
fuse: create(): cannot create journal: read only replica
If you're seeing these errors and you're using the LiteFS built-in proxy then you're probably issuing write transactions on GET
requests. The proxy expects GET
requests to be read-only and it'll proxy non-GET
requests to the current primary.
Since your lease.candidate is set to ${FLY_REGION == PRIMARY_REGION}, this will occur if one of your nodes is not in the PRIMARY_REGION. It looks like the first set of logs were running in sjc but the second set was able get a primary in lhr.
Oh, this seems to be new for Apps V2! This makes sense to me now, I was wondering where the primary region env variable was coming from.
I definitely don't recommend this approach. If all your nodes go down at the same time then you'll lose your data since it's all on ephemeral disks. We have plans for making ephemeral disks work in the future but it's not advisable right now.
I'm using the DB as a cache so persistence isn't something I needed (especially for development). Will use volumes once I deploy for production!
If you're seeing these errors and you're using the LiteFS built-in proxy then you're probably issuing write transactions on GET requests. The proxy expects GET requests to be read-only and it'll proxy non-GET requests to the current primary.
I see! That makes a lot of sense. I'll have to rearchitect the code to handle that. Prior to upgrading to 0.4 and Fly Apps V2 I was only using one machine so I suppose that's how I missed this until now. Thanks a lot for your time!
If you're seeing these errors and you're using the LiteFS built-in proxy then you're probably issuing write transactions on
GET
requests. The proxy expectsGET
requests to be read-only and it'll proxy non-GET
requests to the current primary.
@benbjohnson I have a remix app that seems to be experiencing intermittent failures ("fuse: create(): cannot create journal: read only replica") and I'm thinking it might be due to this. I'm assuming it works sometimes because I'm already on the primary.
In the particular case I'm dealing with, a user has clicked on an email link inviting them to use our app. I normally would avoid doing modifications in a GET request, but in this case it seemed like the most obvious approach. Any suggestions on how to refactor this? I was thinking of maybe initiating a fetch post inside the page's loader.
UPDATE: In case it helps anyone else, I have succeeded in getting this working by doing a fetch with method: 'POST'
inside the loader to the same URL, passing on the same cookies from my loader's request. I then implemented an action inside that route that does the modifying calls to litefs.
Thank you for the awesome new release! I moved over to the new version and really wanted to try the new
exec
feature for database migrations. Unfortunately, I seem to have run on an infinite loop of the following:Logs
``` 2023-04-26T09:20:11.758 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:12.912 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:14.066 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:15.220 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:16.425 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:17.579 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:18.761 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:19.922 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:21.071 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:22.315 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:23.465 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:24.775 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary 2023-04-26T09:20:26.048 app[a9311843] sjc [info] BF2ABFEB3AB37220: cannot find primary & ineligible to become primary, retrying: no primary ```LiteFS configuration for reference in case I missed something there.
This occurred with initially
fly scale count 1
. I tried to dofly scale count 2
too but since there seems to be no primary detected, both fail. Maybe something for handover failed there?