sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Sourcegraph.com does not clean up deleted/renamed repositories #7843

Closed sqs closed 3 years ago

sqs commented 4 years ago

If a repository that ever existed on Sourcegraph.com is deleted on GitHub.com, it does not appear to be cleaned up on Sourcegraph.com. This causes searches that match it to show errors in the UI (see screenshot) and API:

image

https://sourcegraph.com/search?q=repo:sourcegraph/sourcegraph+f:package&patternType=regexp

In this case, the not-found repositories include https://github.com/sourcegraph/browser-extensions, https://github.com/sourcegraph/sourcegraph-php, etc.

This is confusing and makes our app look broken to users trying it out.

tsenart commented 4 years ago

Thanks for filing. This is a known issue, albeit the specific symptom you bring up isn't something I was familiar with and agree that it's confusing users.

@keegancsmith: What do you think about this?

keegancsmith commented 4 years ago

This is confusing and makes our app look broken to users trying it out.

Agreed. I'd like to solve it, but nothing comes to mind that is simple and scales. I'd like to avoid overengineering. Any ideas?

FYI the duplicate issue is https://github.com/sourcegraph/sourcegraph/issues/5890 but this is more generic so I'd be tempted to even close the other one as a duplicate of this.

tsenart commented 4 years ago

Agreed. I'd like to solve it, but nothing comes to mind that is simple and scales. I'd like to avoid overengineering. Any ideas?

Just thinking out-loud here. What if we actively deleted all repos that haven't been visited in the last 30 days?

sqs commented 4 years ago

What if the repos were deleted not in the background automatically, but rather when it noticed they had been deleted from GitHub? Clearly the fact that they are shown in the UI as deleted means Sourcegraph knows they were deleted from GitHub.

keegancsmith commented 4 years ago

@sqs I like your suggestion. If I am not mistaken we will try search them, a remote git operation will fail => they are gone.

So we need some nice way for this information to get into repo-updater so it can manipulate the repos table.

Proposal: gitservers maintain a circular buffer of repos that failed remote operations + when it happened. repo-updater polls it (like it does in other instances), and uses this information to trigger syncsubsets.

Any other simple ideas? I don't really want to special case this in search. But it might make sense for Sourcegraph.com so that we don't show the "missing repos".

tsenart commented 4 years ago

@rvantonder: Could we get your take on this? Is there a simple way for us to solve this issue without spending a lot of time on it? Might be OK to special case sourcegraph.com and filter out those errors specific to repos that are gone.

unknwon commented 3 years ago

@ryanslade IIRC you worked on handling renamed repos, does that also cover for the case of deleted repos on GitHub.com?

ryanslade commented 3 years ago

The fix I submitted recently was when browsing directly to renamed repos, not searching them: https://github.com/sourcegraph/sourcegraph/pull/18145

However, I just tried to reproduce the original error and it doesn't appear to be happening anymore.

unknwon commented 3 years ago

Thanks @ryanslade ! So I’m gonna close until we see this again.