wayfair-incubator / telefonistka

Safe and Controlled GitOps Promotion Across Environments/Failure-Domains
https://hub.docker.com/r/odedbenozer/telefonistka
MIT License
63 stars 7 forks source link

Race condition when adding a new component and deleting a stage #105

Closed Ben10k closed 6 months ago

Ben10k commented 1 year ago

Description

We had 2 PRs open at the same time:

It happened to be that PR2 was merged before the PR1.

After the merge of PR2, telefonistka kicked in and created a new promotion PR (PR3) of the new component to all stages, including the the one that is being removed in PR2. This is expected an correct behaviour, as at that point in time, the main branch still had the old stage present.

Then, PR2 was merged, which removed the stage from the main branch.

But the promotional PR3 still contained promotion to the stage that no longer exists on the main branch.

Expected Behaviour

After PR1 was merged, I expected that telefonistka would either:

Actual Behaviour

Nothing happened after PR1 was merged.

Workaround

While debugging and trying to figure out what was actually happening, we have tried redelivering the webhooks from Github to telefonistka.

  1. We tried to redeliver webhook for PR1 pull_request_closed. Nothing happened
  2. We tried to redeliver webhook for PR2 pull_request_closed. Then telefonistka raised a new PR4 to deploy the new component from PR2 to the current existing stages in telefonistka.yaml on main branch.

Then we were able to close the PR3 and merge the PR4

Affected Version

v0.1.3

Steps to Reproduce

  1. Set up a repository which has telefonistka enabled.
  2. Have empty files:
    • _WORKSPACE/component1.yaml
    • region1/component1.yaml
    • region2/component1.yaml
  3. Have telefonistka.yaml with this content:
    configuration
    promotionPaths:
    - sourcePath: "_WORKSPACE/"
    promotionPrs:
    - targetPaths:
    - "region1/"
    - "region2/"
  4. Create a PR (PR1) and do NOT merge it
    • delete file region2/component1.yaml
    • remove line - "region2/" from telefonistka.yaml
  5. Create a PR (PR2) and do NOT merge it
    • add a new empty file _WORKSPACE/component2.yaml
  6. Merge the PR2
  7. Verify that telefonistka created a PR3 to promote _WORKSPACE/component2.yaml to both region1/component2.yaml and region2/component2.yaml
  8. Merge the PR1
  9. Verify that the PR3 is still promoting _WORKSPACE/component2.yaml to the region2/component2.yaml while region2 is no longer present on the main branch.

Checklist

Oded-B commented 1 year ago

Hi @Ben10k, I've somehow missed this issue. I'll take a look in the following few days.

Oded-B commented 1 year ago

Because Telefonistka is stateless I would have to compare the current config with a pervious version to find the removed path. Then go over all currently open PRs and search for files that match the remove path, seems possible.

This is a big change (relative to size of the project) so the plan is to start with just commenting warnings(no deletion/mutation). I'll Warn in the triggering PR (PR1) when its opened/synced and the affected PRs(PR3) when PR1 is merged.

Ben10k commented 1 year ago

Thank you! Having a warning pop up as a comment on PR3 once PR1 is merged would be a nice addition. Then we would know that something is not right and then we can change the promotional PR before approving/merging it.

github-actions[bot] commented 11 months ago

Automatically marking issue as stale due to lack of activity

github-actions[bot] commented 11 months ago

Automatically closing this issue as stale

github-actions[bot] commented 9 months ago

Automatically marking issue as stale due to lack of activity

github-actions[bot] commented 7 months ago

Automatically marking issue as stale due to lack of activity

github-actions[bot] commented 6 months ago

Automatically closing this issue as stale