Closed asdine closed 3 years ago
Heads up @tsenart - the "team/cloud" label was applied to this issue.
Make Gitserver use the new type/package instead of calling the other services to get tokens, and remove/deprecate unecessary endpoints on repo-updater and frontend if any.
Can you elaborate on this, it'd be useful to list all places that need updating.
Make Gitserver use the new type/package instead of calling the other services to get tokens
If we're creating a separate custom git credential helper, do we need to modify git-server to read tokens from Postgres?
Write a type/package that provides an unified way of getting a token/clone url for a given repo, and make all services that need to access the token use that type.
What are all the places that'd need changing?
Can you elaborate on this, it'd be useful to list all places that need updating.
What are all the places that'd need changing?
I'll update the issue with more details on that aspect
If we're creating a separate custom git credential helper, do we need to modify git-server to read tokens from Postgres?
The custom git credential helper is a binary/script executed by git. I don't know how far we can go with this program but we were thinking that the easiest would be making it simply call a gitserver endpoint and let gitserver fetch the token for it.
cc @sourcegraph/cloud @keegancsmith
In general, SGTM. Hope we can also get +1 from @keegancsmith, but it's not a blocker.
In general, SGTM. Hope we can also get +1 from @keegancsmith, but it's not a blocker.
Will review tomorrow morning.
LGTM! Great stuff.
I'm writing out the current state to ensure my understanding is current and to align. We do git fetch/clone in two main ways:
sourcegraph.example.com/$repo
sourcegraph-frontend
tells repo-updater
to update $repo
repo-updater
's queue tells gitserver
to update/clone $repo
with $remote-url
.gitserver
fetches/clones $repo
with $remote-url
. Sets git config remote.origin.url
as $remote-url
.$repo@$commit
.sourcegraph-frontend
sends request to gitserver with the option to ensure $repo@$commit
exists.gitserver
doesn't have $commit
. It does a git fetch using the config remote.origin.url
.There is also some extra logic around sourcegraph-frontend
hydrating in $remote-url
with a retry, but I'm not sure if this logic even works to be honest. I'm very excited about removing that, and instead having gitserver able to hydrate it.
The proposal is removing any mention of $remote-url
above. Instead, gitserver
will be asking repo-updater
for $remote-url
. This sounds good to me. I have a slight concern that gitserver will now have a dependency on repo-updater
which is a SPoF. But that can be solved in practice, and any downtime in repo-updater
will be temporary and just result in a delay in updates via EnsureRevision
.
- Write a type/package that provides an unified way of getting a token/clone url for a given repo, and make all services that need to access the token use that type.
The proposal doesn't make it clear where the clone_url will be stored and who can read it. I am assuming it will be repo-updater which implements the "get me a token", and it will fetch it from the external services table?
Or will this type/package directly speak to the DB?
This type would choose a random external service associated with the repo and read the token from that external service config. (2d)
A random token will make it hard to debug when things go wrong. However, it does give us "eventual" updates assuming a single token works. I'm unsure about how to do this better, since its better than never updating a repository.
- Deprecate/Drop the external_service_repos.clone_url column (1d)
If the order of the list is the order you intend to do this work, I would do this last. Otherwise the order looks great.
- Make Gitserver use the new type/package instead of calling the other services to get tokens, and remove/deprecate unecessary endpoints on repo-updater and frontend if any. (2d)
Gitserver doesn't call any other services. It gets told the clone URL. See my description above. So this bit of work may slip and take the most time of all the points.
- (Spike) Write a custom git credential helper to let git ask for a token whenever it's needed. This helper would call a gitserver endpoint to get a token on demand. This way, no token would be stored on disk. (1d)
This sounds like a great! Some random thoughts, answering them may take a day, but once answered another day to implement :)
remote.origin.url
. Your credential helper then receives that.@keegancsmith:
Thanks for a very thoughtful response!
Proposal
The proposal is removing any mention of
$remote-url
above. Instead,gitserver
will be askingrepo-updater
for$remote-url
. This sounds good to me. I have a slight concern that gitserver will now have a dependency onrepo-updater
which is a SPoF. But that can be solved in practice, and any downtime inrepo-updater
will be temporary and just result in a delay in updates viaEnsureRevision
.
To be precise, we will remove the user credential (i.e. access token) from the $remote-url
, but the network location of the repository is still living in each individual .git/config
.
Comments
- Write a type/package that provides an unified way of getting a token/clone url for a given repo, and make all services that need to access the token use that type.
The proposal doesn't make it clear where the clone_url will be stored and who can read it. I am assuming it will be repo-updater which implements the "get me a token", and it will fetch it from the external services table?
Or will this type/package directly speak to the DB?
"this type/package" directly speak to the credential storage (based on config, could be DB, Vault, GSM, etc.). On top of that, we serve endpoints (in frontend, or standalone service) to whoever (i.e. gitserver) needs a token for a particular repository on-demand.
This type would choose a random external service associated with the repo and read the token from that external service config. (2d)
A random token will make it hard to debug when things go wrong. However, it does give us "eventual" updates assuming a single token works. I'm unsure about how to do this better, since its better than never updating a repository.
- Deprecate/Drop the external_service_repos.clone_url column (1d)
If the order of the list is the order you intend to do this work, I would do this last. Otherwise the order looks great.
@asdine: I feel I might be missing something here, I thought it is enough that we only drop the credentials from the external_service_repos.clone_url
not the entire column? Otherwise seems unintuitive how to quickly identify what's the clone URL of a given repo? (It's just my concern, will have to verify)
- (Spike) Write a custom git credential helper to let git ask for a token whenever it's needed. This helper would call a gitserver endpoint to get a token on demand. This way, no token would be stored on disk. (1d)
This sounds like a great! Some random thoughts, answering them may take a day, but once answered another day to implement :)
- What URL will you store? IE you still need to set
remote.origin.url
. Your credential helper then receives that.
Yes we still need to set remote.origin.url
, mentioned above.
- How will you tell git to use the credential helper? envvar, repo config, global config?
We can achieve that via the global .gitconfig
:
git config --global credential.helper <path to our credential helper>
- How does it interact with SSH?
From what I read, git shells out to ssh
so git credential helper won't be part of it, we need a complementary solution to support that (e.g. maintains a .ssh/config
file per gitserver), that is out of scope unfortunately.
- We have admins sometimes setting the git config / mounting secrets / etc. Does this interfere with that? If so, how do we still allow that use case?
Git credential helper is involved only when the git CLI couldn't determine what's the credential (in an interactive shell, you get prompted to enter username and password). So if admins configured credentials in other ways that is readable by git CLI, all of them will continue to work. Think of Git credential helper as the catch-all.
- What will the credential helper be? IE will you ship a new binary in all our gitserver containers? Or maybe you can write a shell script to a temporary path and execute that?
A shell script is enough to get started but @asdine and I worried the maintainability and the expertise we have in bash 😅 So most likely will be a Go binary shipped in all our gitserver containers.
The reason I mentioned a shell script is it's just a bit of glue so you avoid needing to ship an extra binary. IE your shell script can simply speak to a local unix or tcp socket that speaks to gitserver. So won't be more than a few lines and potentially static.
@tsenart and I are hacking on removing some tech debt around CachedGitRepo
, GitRepo
and CommandRetryer
. We have come up with a plan, which essentially is implementing the "get remote URL" part of this issue. We may have actually solved all of this. So if someone plans to tackle this (if we haven't closed it yet), reach out to us.
@keegancsmith sounds good! I'm not planning on working on this issue this cycle
For Gitserver to be able to clone and fetch repos it needs to get tokens from repo-udpater and frontend. The way it is currently implemented poses several problems:
external_service_repos
table.Proposal
external_service_repos.clone_url
column (1d)If this proposal is accepted, the bullet points above would be broken down into multiple issues