Closed harishv7 closed 5 months ago
Hmm, what are your thoughts on just always force pushing since EFS is our source of truth? I think we just need to be very deliberate about ensuring that sites are repaired when users are migrated over to email login (which is already covered in our runbook!) - that way we won't run the risk of accidentally losing user commits which were previous made directly to github.
@alexanderleegs what does "ensuring that sites are repaired when users are migrated over to email login" mean?
@harishv7 I think the form solution should go out with this tho, making the lock manual + deviating from the original flow feels incident prone. Could you just add this in the deployment section that this ticket should be completed together with this?
Hmm, what are your thoughts on just always force pushing since EFS is our source of truth? I think we just need to be very deliberate about ensuring that sites are repaired when users are migrated over to email login (which is already covered in our runbook!) - that way we won't run the risk of accidentally losing user commits which were previous made directly to github.
@alexanderleegs what does "ensuring that sites are repaired when users are migrated over to email login" mean?
Currently cloning the site is done automatically when agencies are migrated from netlify -> amplify, but we don't always do the github login -> email login step at the same time! This means that it's possible for user content to be lost if we migrate to amplify > user edits site more > we migrate the email login but forget to run the site repair form > user makes edit on new login, which causes a force push. Just something to be aware of, it's already in our runbook to run site repair form after doing an email login migration
With https://github.com/isomerpages/isomercms-backend/pull/1327, we can now merge this PR in as discussed during incident meetings. Going forward, any divergence between EFS and GitHub will automatically take EFS as the source-of-truth and perform a git push if the normal git push (without force) fails the first time.
Problem
Some incidences of a divergence between EFS and GitHub recently occurred. In reality, we trust EFS as our source-of-truth. Hence, it makes sense for our to automatically recover from divergences by doing a force push to GitHub.
There still exists a case when Ops might edit on GitHub while site users are editing which can cause the pushes to override Ops' changes. For this, a solution will be to enable Ops to lock the repo -> perform the edit -> perform GGS repair using form + unlock affected repo automatically. For now, engs can manually add/remove the lock on DDB
Closes ISOM-947
Solution
On failure to git push, we retry twice. On the second retry, we use a git push --force option to forcefully push EFS commits to GitHub.
Breaking Changes
Screenshots of before and after
To simulate this on staging I created a divergence by editing the repo on GitHub.
Further editing on CMS caused no errors and the divergence auto-recovered by taking EFS's state as the source of truth.
The commit on GH was overriden by the CMS changes
Tests
git status
to see if the first 2 attempts fail due to the divergence and check logs