sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Automation Backend: Tech Debt, Developer UX and Ideas for Architecture & Design #6572

Closed mrnugget closed 4 years ago

mrnugget commented 4 years ago

This is a braindump of all the things we ran into while working on #6085. The items on this list range from "nice to have" to "we should do it" to "we need to do this, sooner rather than later"

a8n.Store

Allow fetching all rows with Limit: -1 instead of Limit: $number-we-think-should-be-high-enough

Right now we sometimes fetch all rows of a given table by specifying something like Limit: 10000 in the hopes of never having to fetch more than that. See the search results here: https://k8s.sgdev.org/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+Limit%5C:%5Cs%2B%5Cd%7B3%2C%7D%2C+file:enterprise/internal/a8n/&patternType=regexp

Wha we already do for ChangesetEvents is to allow specifying Limit: -1:

https://github.com/sourcegraph/sourcegraph/blob/8248533108ae3f47bd5fcb451b3912a4f0152e0a/enterprise/internal/a8n/store.go#L609-L618

Architecture and Design of a8n code

Use a persistent queue

Extracted into https://github.com/sourcegraph/sourcegraph/issues/6723

Execute ChangesetJobs in parallel

Extracted into https://github.com/sourcegraph/sourcegraph/issues/6722

Error responses in gitserver when applying diff fails

Extracted into separate issue: https://github.com/sourcegraph/sourcegraph/issues/6717

Changesets are never cleaned up

Right now a user can create multiple changesets with createChangesets and never attach them to a campaign and they'll persist.

The same happens when a user deletes a campaign: the changesets will stay around and will be synced.

We can probably find some heuristic when it's safe to clean up a changeset, i.e. when a changeset is older than 5 hours and hasn't been attached to a campaign, we delete it.

Enterprise and OSS

Executing CampaignType.searchQuery

Right now we use our own wrapper around searchResolver called graphqlbackend.RepoSearch. Is it maybe better to use zoekt.Searcher.List in the a8n.Runner than graphqlbackend.RepoSearch? See the code here @keegancsmith suggested this in https://github.com/sourcegraph/sourcegraph/pull/6309#issuecomment-548691096 (See also https://github.com/sourcegraph/sourcegraph/issues/6627 for more context on this.)

We probably want to go through the user-facing search code path for matchTemplate (in the case of a comby campaign) and use structural search there (once it's ready). But do we also need to do the same for searchScope?

Naming of and in repos package

GraphQL layer and a8n

Inconsistencies in type definitions

Developer UX when dealing with external services, repos and talking to code hosts

Fix multi-file diffs without extended header in go-diff

go-diff has a bug where it doesn't parse multi-file diffs correctly that have no headers between diffs.

See this piece of code

tsenart commented 4 years ago

UpdateCampaignJob needs nulltimeColumn for StartedAt and StartedAt

Does this not break things as it currently is?

mrnugget commented 4 years ago

I just checked. That line is kinda invalid and I'll remove it from the ticket. We already do the correct thing when creating the CampaignJob:

https://github.com/sourcegraph/sourcegraph/blob/7e090aba03dbef54857d017a4c468e6ae840dd21/enterprise/pkg/a8n/store.go#L1541-L1543

And that works as intended: FinishedAt and StartedAt are set to NULL when the CampaignJob is first created. That allows us to query progress while they run.

We don't strictly need this in UpdateCampaignJob.

Edit: Updated this comment to distinguish between CreateCampaignJob and UpdateCampaignJob

mrnugget commented 4 years ago

Closing this, because the majority of the things in this ticket is now either fixed or outdated.