sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.28k forks source link

Inactive project in Gitlab returns a 403 when attempting to clone the repo #40912

Open indradhanush opened 2 years ago

indradhanush commented 2 years ago

An inactive project in Gitlab will return a 403. We should attempt to not sync the repo in the future once we've seen this error.

remote: You are not allowed to download code from this project.
fatal: unable to access 'https://gitlab.foo.com/bar/baz': The requested URL returned error: 403

Steps to reproduce

  1. Set up a Gitlab code host connection
  2. Add a project and a repo and make it inactive
  3. Let the Sourcegrpah sync run and observe the error
  4. Conditionally, try to clone it directly in gitserver to see the same error

Expected

Mark the repo as ignored or similar with the reasoning and do not sync it.

This does leave the edge case when the repo is activated again - but maybe we can handle this by syncing it and detecting the error and instead of marking it as errored, adding a new status to the clone_status column of the gitserver_repos table to indicate the 403 error. This should ensure that the repo is synced if it is ever activated again, but also prevents from the repo being shown as error-ed.

Originally observed in https://github.com/sourcegraph/customer/issues/1250.

jasonhawkharris commented 2 years ago

@indradhanush I got a chance to take a deeper look this morning. AFAICT GitLab does not label repos as inactive. Their docs only give instructions for determining inactivity based on what their APIs provide. For example, the projects API has a last_edited node... etc.

It does appear that at one point they allowed admins to set inactive_project_deletion, but according to their docs, this capability has been removed. Also, check the version history of the flag here.

I'm still curious about the error you're getting here. Could this possibly have been a result of GitLab auto deleting inactive repos when that was still a capability they offered?

jasonhawkharris commented 2 years ago

Also, the PR they issued to remove the flag was only two weeks ago. This may no longer be an issue.

indradhanush commented 2 years ago

@jasonhawkharris Thank you for looking into this. I was successfully able to reproduce this by turning off the toggle highlighted in the screenshot below under the Settings page of the project. For reference, this is the project: https://gitlab.sgdev.org/indradhanush/foobar

image

The PR you linked looks like is related to another feature flag. I don't think project deactivation would be going away any soon though.

How I was thinking we might solve this would be to check if the error message matches the pattern like the one we get now while cloning the repo:

remote: You are not allowed to download code from this project.
fatal: unable to access '<NAME>': The requested URL returned error: 403

And then marking it something like "ignored" or something similar as suggested in the issue description.

Sidenote: It does look like there isn't a top level active / inactive boolean in the json response from the projects API which forces us to rely on the error message's pattern.

I investigated a little more, and it also appears that the following API call returns a 403 is the project is disabled:

curl --header 'Authorization: Bearer <redacted>' https://gitlab.sgdev.org/api/v4/projects/<project-id>/repository/tree

and returns a json response if it is enabled. But this wouldn't be a good approach to detect the project's active / inactive status as it would require at least 1 extra API call before we can make a decision.

jasonhawkharris commented 1 year ago

Removing the good first issue tag from this issue. After a call with @mrnugget, it's clear that this won't be a trivial. Longer explanation to come.