Create Easier Process for Stopping Repo Cloning and Cleaning Up Inappropriately Cloned Repos

anorrish commented 1 year ago

Feature request description

The current process for stopping repo cloning is intrusive to the cluster and non-intuitive to admins. It would be great to have the ability to 'restart' cloning directly from the Admin window.

Is your feature request related to a problem? If so, please describe.

If the initial repo configuration isn't correct, larger customers may need to wait a significant amount of time for a large number of inappropriate repos to be cloned into their environment. There isn't a clean way to stop the cloning process, update the config, remove inappropriately cloned repos, and restart the cloning. This can cause headaches and delays during customer deployment, particularly in situations with high repo volumes.

The current process for accomplishing this 'restart' is to update the repo config, manually restart the repo-updater pod, manually restart the gitserver pod, and manually purge the disc of accidentally cloned repos, much of which feels quite clunky for a situation that customers seem to regularly run into. On top of that, there may not always be a straightforward way of adding the accidentally cloned repos to the 'exclude' list to remove them from the instance.

A simpler way to pause and/or restart the cloning process with updated config could provide customers with a safety net when configuring their repos and save time during deployment.

Here's a link to a customer example: https://sourcegraph.slack.com/archives/C03SDQ7BYRM/p1666799984776319

Describe alternatives you've considered.

The process listed above of updating the repo config, manually restarting the repo-updater pod, manually restarting the gitserver pod, and manually purging the disc of accidentally cloned repos is the only existing process I know of.

/cc @jplahn @ryphil

mrnugget commented 1 year ago

Thoughts from discussion with @eseliger about this: maaaaaaaaaaaaaybe the best solution to this is to make cloning a dbworker?

sashaostrikov commented 1 year ago

Having a dbworker to clone repos will let us:

Cancel the ongoing repo clone
Have a history of repo clones
Repeat any clone of the repo based on the history above.

I agree that dbworker is a number 1 pick.

mrnugget commented 1 year ago

Yeah, I think there's some nuance here though, because there's two big options, as I see it:

Replace the complete scheduler in repo-updater with a queue/dbworker. That's a ton of effort if we want to do it in one shot.
Start by "just" replacing the clone-request that's sent to gitserver with a dbworker. The scheduler would create a record, gitserver would dequeue it. That would make things canceleable. But I think we need a spike to see how much this changes/destroys. I can imagine, for example, that it'll be tricky to replicate the "sharding" here, because right now sharding is made on a per-request basis: repo-updater calculates shard, sends request to shard. With a dbworker, when do we calculate the shard? When we enqueue? But then what if it's a long time in the queue and outdated when it's dequeued? Should we calculate when we dequeue and re-enqueue if if was dequeued by wrong shard? Or both?

sourcegraph / sourcegraph-public-snapshot