sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Updating a Campaign with same diff/base-branch for already-merged changeset fails #8846

Closed mrnugget closed 4 years ago

mrnugget commented 4 years ago

See this comment: https://github.com/sourcegraph/customer/issues/13#issuecomment-595331410

Creating commit from patch for repo "github.com/sourcegraph/go-diff": "gitserver: applying patch: exit status 1" (command: "git apply --cached -p0 --unidiff-zero", output: "error: patch failed: go.mod:1\nerror: go.mod: patch does not apply\n")

This is because we try to create a commit, push it and open a PR again, even though the changeset for the repository has already been merged.

What we need to do: when updating a campaign, we do nothing for a repository if an existing changeset with the same diff has already been closed or merged.


Tasks

See this comment below for the reasoning behind these tasks.


The question is: what if the changeset has already been closed/merged but the diff is different?

See this comment below for an updated answer to this question. ~For now, I think we should simply ignore that. Why? Because in order to make this work properly, we'd need to check whether the diff or the base branch are different (the same diff could maybe be still applied if the base branch changed) and in that case, we'd need to open another pull request and that would kinda conflict with our underlying assumption that we only have one changeset per repository per campaign. If we do think that many changesets per repository per campaign is valuable, we need to tackle it separately. (cc @christinaforney @sqs)~

mrnugget commented 4 years ago

In the Campaigns synced yesterday we (@eseliger @christinaforney and me) decided on the following:

  1. If the base-branch & diff of the new patch are the same as the base-branch & diff of a closed or merged changeset for the given repository: ignore.
  2. If the base-branch & diff of the new patch are NOT the same as the base-branch & diff of a closed or merged changeset for the given repository: we create a new changeset.

In this ticket and this iteration I only want to tackle (1).

But (2), which breaks our 1-repo-1-campaign-job-1-changeset-job-1-external-changeset invariant (as mentioned above), is highly important for our future usecase of maintaining invariants.

Example: we create a campaign that finds and removes bad code from repositories A, B, C.

  1. The changeset for repository B gets merged into master.
  2. On master someone reintroduces the bad code
  3. When we re-run (probably automatically in the future) the src actions exec command we want to produce another patch (which might be exactly the same on a character-by-character basis) for repository B that removes the bad code again
  4. When we then update the campaign, we want to create another pull request for the same repository in the same campaign.

There are some tricky things involved in this (the old (merged) changeset is associated through changset-job->campaign-job with the old campaign plan, if we update the campaign to use a new campaign plan, how can we keep the old changeset in the campaign?), but I don't think they're insurmountable. We probably have to loosen our associations a bit and introduce a has-many association where we now have a has-one.

mrnugget commented 4 years ago

Also note (cc @eseliger) that there's a frontend component involved, since currently the "Preview of changes" says the changeset will be updated and is a draft:

Screen Shot 2020-03-10 at 15 42 26
mrnugget commented 4 years ago

Okay, here's what happens.

Before the update, the initial creation of the campaign:

  1. Campaign plan is created with src. Includes patch for repo A, with "baseRevision": "refs/heads/master".
  2. When creating campaign plan, Sourcegraph resolves "refs/heads/master" to revision F00B4R.
  3. Campaign is created.
  4. Changeset is created based on repo A at revision F00B4R.
  5. Changeset is merged. refs/heads/master now points to F00B4R+1.

Here's where the update goes wrong:

  1. Another campaign plan is created with src. But src doesn't bust its cache: because it uses "baseRevision": "refs/heads/master" as cache key.
  2. When creating campaign plan, Sourcegraph resolves "refs/heads/master" to revision F00B4R+1. This is where things go wrong: the newly created CampaignJob has a different Rev, even though the new patch is based on the same rev as the old one.
  3. Campaign is updated
  4. We go through every changeset and determine whether it needs updating. Since (1) the rev is different and (2) we don't skip already-merged/already-closed changesets we decide to update the changeset.
  5. Gitserver tries to make a commit out of the patch, at revision F00B4R+1! That fails with:
creating commit from patch for repo "github.com/sourcegraph/automation-testing": "gitserver: applying patch: exit status 1" (command: "git apply --cached -p0 --unidiff-zero", output: "error: patch failed: file1.txt:1\nerror: file1.txt: patch does not apply\nerror: patch failed: file2.txt:1\nerror: file2.txt: patch does not apply\nerror: patch failed: file3.txt:1\nerror: file3.txt: patch does not apply\n")

There are two problems:

  1. We already knew of this one: we don't ignore already-merged/already-closed changesets when deciding whether to update them or not.
  2. But this is new: we don't use the actual base revision (with a v! e.g.: adsfasd1341r5) when checking the src CLI cache and when creating campaign plans. We use base revspec (e.g. refs/heads/master) which is a ref (note the f!) as baseRevision when creating a campaign plan from patches.

Problem 1 is relatively easy to fix.

Problem 2 I'm not yet sure about, but I think we should extend this part in src CLI to include both: Rev (F00B4R) and BaseRef (refs/heads/master). That will also fix the missing cache bust when running the same action on the updated repo. But that also means we need to extend our GraphQL API and add a required baseRef field to CampaignPlanPatch here.

It is a breaking change in the API, but it shouldn't be too bad:

  1. Older src versions will not send in baseRef to Sourcegraph <3.14, but Sourcegraph <3.14 doesn't care
  2. Sourcegraph >3.14 will require a newer src version that sends in baseRef
  3. Sourcegraph >3.14 will rqeuire the baseRef field on CampaignPlanPlatch
  4. Sourcegraph >3.14 will persist the baseRev and baseRef correctly: their state at the time of patch creation and not their state at time of campaign plan creation
mrnugget commented 4 years ago

Regarding the frontend changes: merged/closed changesets that will be skipped are shown under "To be updated"

Screen Shot 2020-03-11 at 13 34 22

I assume the "Draft" badge is due to #8938 and technical they are "updated" (because they're not deleted or created), but I think a fourth category "Unmodified" makes more sense. cc @eseliger @christinaforney

eseliger commented 4 years ago

Yep that sounds good, I don't think we have a clear indicator for that in the API yet though, do we? 🤔

mrnugget commented 4 years ago

I extracted the UI work into this ticket: https://github.com/sourcegraph/sourcegraph/issues/8941 and added the ticket to this ticket's task list.

mrnugget commented 4 years ago

Closing this because the backend portion is done and UI work has been extracted into #8941 for which there is already https://github.com/sourcegraph/sourcegraph/pull/8952 open and approved.