microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.72k stars 864 forks source link

Best way to authenticate against a git repository in a build process #1601

Closed matthid closed 6 years ago

matthid commented 6 years ago

Description

On of our users reported https://github.com/fsprojects/Paket/issues/3228

The basic problem is to authenticate against a tfs repository within a build process.

Can we just use System.Accesstoken? Is there documentation around that? Is there a credential manager we can use or implement ourself (like it is for nuget) or any other way to hook into the process?

TingluoHuang commented 6 years ago

@matthid

  1. System.AccessToken is generated per VSTS build/release job, it's life time is equal to the job time, a new token will be generated on next job.
  2. Using System.AccessToken, you requests to VSTS will be authenticated as either "Project Collection Build Service identity" or "Project Build Service Identity" of your VSTS account base on your build/release definition setting.
  3. By default, that service account should have read permission to all repositories under your VSTS account, but customer can choose to deny certain repositories from Web UI. https://docs.microsoft.com/en-us/vsts/pipelines/scripts/git-commands?view=vsts
  4. System.AccessToken is available for Task by default, but using it in adhoc script (PowerShell, Commandline) require additional setting. https://docs.microsoft.com/en-us/vsts/pipelines/scripts/powershell?view=vsts#oauth

I don't think i have clear answer for you, but i try my best to provide you more context.

Personally, i would suggest customer always provide credential explicit instead of using cred manager in CI system.

matthid commented 6 years ago

Personally I think the ideal solution would be if we could just "prepare" the environment in a way such that we can call git <...> and it "just works". This way third party tools will work. Is there some way to do this?

Do any of the above options exist today?

I don't think letting users create PAT-Tokens and manually setting up things is very nice.

TingluoHuang commented 6 years ago

System.AccessToken is for this, but as i said customer need to opt-in to allow ad-hoc script have access to System.AccessToken from environment variables.

I think here is what we can do to make it a better experience for customer:

  1. Check TF_Build environment variable exist or not, TF_Build will be set if the process is running as a child process of VSTS agent.
  2. After check TF_Build, check System_AccessToken exist or not, if it doesn't exist, you can print out a message like Your downstream operations may fail if it needs credential back to VSTS and provide a link to Doc about the opt-in experience. https://docs.microsoft.com/en-us/vsts/pipelines/scripts/powershell?view=vsts#oauth
  3. git -c http.extraheader="AUTHORIZATION: bearer <System_AccessToken>" clone https://github.com/microsoft/vsts-agent
matthid commented 6 years ago

but as i said customer need to opt-in to allow ad-hoc script have access to System.AccessToken from environment variables.

But that is only true for scripts not for vsts-tasks, correct? We actually have a vsts-task where we want to setup stuff in such a way that it "just works" as well.

TingluoHuang commented 6 years ago

then, System.AccessToken is always there for a vsts-task, so you should be able to achieve your goal to make is "just work"

matthid commented 6 years ago

But can I make git use it when a 3rd party code is between the task and the git call:

  1. Vsts Task (with access to Access Token) -> Starts paket.exe -> Starts git.exe How can git access the repository and use the token in that scenario?

To elaborate: With Nuget we can setup a "dummy" credential provider and then

  1. Vsts Task setup credential manager (basically put a custom *CredentialProvider.exe in some path)
  2. Vsts Task (with access to Access Token) -> Starts thirdparty.exe -> nuget.exe -> Looks for credential managers and uses our installed one from step 1 -> Starts OurCredentialProvider.exe and uses token
  3. Remove credential manager
TingluoHuang commented 6 years ago

@matthid you can put the credential into user level git config. git config --unset-all http.<your VSTS account>.extraheader git config http.<your VSTS account>.extraheader "AUTHORIZATION: bearer <System_AccessToken>" run git operations git config --unset-all http.<your VSTS account>.extraheader

matthid commented 6 years ago

Thanks. we will take a look at that but I guess it should do.

danielmhair commented 3 years ago

@TingluoHuang Thank you for posting your last solution! I can verify that by doing this fixes my issues with git log:

git config --unset-all http.<your VSTS account>.extraheader
git config http.<your VSTS account>.extraheader "AUTHORIZATION: bearer <System_AccessToken>"
# git log command here
git config --unset-all http.<your VSTS account>.extraheader
steven-hyland commented 2 years ago

I came across this issue when I was searching for a solution to my problem, so I'm going to leave my solution here in case it helps someone else someday.

The above solution from TingluoHuang did not work for me (UPDATE: it does work if the --global option is included. Read on and my other comments below for more context as to why). After adding that config setting, I continued to get errors like the below. I also tried several variations with different http.xyz and credential.xyz config settings and nothing worked. This could be because we are not using SSH keys and are just relying on the built-in Git for Windows credential manager, but I don't know for sure.

│ fatal: could not read Username for '[https://dev.azure.com'](https://dev.azure.com%27/): terminal
│ prompts disabled

In my case, I am running a Terraform script in my pipeline that references modules from other Git repositories (all repos are in ADO). So it's similar to matthid's case where there is an intermediary program invoking Git. Using one of the modules looks like this:

module "linux_web_app" {
  source              = "git::https://dev.azure.com/<omitted>/<omitted>/_git/<repo_name>?ref=v0.1"
  resource_group_name = var.resource_group_name
  service_name        = var.service_name
}

We have a few of these, and some of these also reference others in a nested way. On a developer's machine, this works well because the credential manager supplies the right creds to every connection. But in pipelines it didn't work because all our ADO projects/repositories are set to private. One option would be to pass in an ADO PAT, but I thought that should be unnecessary since the pipeline's access token is already available. At first, my solution was to overwrite the URL to include the token like so:

TOKEN="git::https://dev.azure.com"
REPLACEMENT="git::https://$(System.AccessToken)@dev.azure.com"
sed -i "s|$TOKEN|$REPLACEMENT|g" main.tf

This works fine but is a little clunky. It also doesn't work in the nested case, because when dependent modules are downloaded they don't have the token in the URLs, so then you have to go back and overwrite all instances in the module cache dir again, which is even more clunky.

In the end I found the insteadOf option in Git's config which works very well:

git config --global url.https://$(System.AccessToken)@dev.azure.com.insteadOf "https://dev.azure.com"
<rest of the script goes here>
git config --global --unset url.https://$(System.AccessToken)@dev.azure.com.insteadOf

This works very well in the pipeline and also preserves the behavior of just letting the credential manager do its thing when running on a developer's machine. Note that this requires the proper resources: and uses: blocks elsewhere in the pipeline.yml.

GaTechThomas commented 2 years ago

Be careful setting global git configs on agent machines, since global can cause side effects in other processes. Specifically, it injects the token of the current job at a level accessible by other processes (current and later), and that token is short lived - this means that its both a security issue and a breaking change for the other processes the moment the token is invalidated. Additionally, on rare occasion something breaks during a pipeline run that causes the token cleanup step to not be called (we MUST assume that any call can be the last call that occurs - think BSOD), which leaves git broken on the machine until hard cleanup is done (and a beast to diagnose what happened since it only fails on subsequent pipeline runs).

We had the same terraform git reference issue, and the solution that worked for us with not side effects was to:

  1. Perform the usual checkout of the terraform repo
  2. Find all git references across all terraform files in the local copy of the repo and replace the org name in the URL with the System.AccessToken.
  3. Perform terraform init and the usual subsequent commands

This works every time, is scoped to the current pipeline run, and does not have cross-process security implications.

One caveat: If the terraform has nested git references (i.e., one terraform repo references another repo that also has repo references) then it becomes much more difficult to deal with. However, the recommendation from hashicorp/terraform is that nested references should not be a practice - such practice is an indicator of other structural problems that should be resolved first.

steven-hyland commented 2 years ago

Thanks for the comment, but there's a few other things that affect what I think you're saying. Sorry if these weren't clear:

We have the nested git references problem that you describe. We were initially doing your same steps to resolve, but the problem was that terraform init would break and fail because it downloads the other modules during that step. So the process was to run terraform init and fix up the URLs in a loop until it didn't break anymore. This is dumb, so we are very incentivized to find a cleaner approach, as I'm sure you can understand.

The current method does effectively the same thing - replacing the URL to include the token - just at the git config level instead of in each individual file. It just makes sense to me that if the git credential manager on my dev machine can handle this scenario fine, there's got to be a similarly simple way to make it work in CI.

The recommendation from HashiCorp is noted and thanks for mentioning it. I would be interested to know the specifics of the "structural problems" they mention if you have a link.

steven-hyland commented 2 years ago

Well either way, your comment got me thinking again, and after some experimentation I just discovered that the previous solution posted with the extraHeader options does seem work if I include the --global option. Without that it just continues prompting for a username/password, which it can't do because this is running in CI.

I guess that method and the one I posted are essentially equivalent, but after thinking about it more I do take your point about breaking other processes. Maybe someday in the future this will not be running on an ephemeral image and it will suddenly starting breaking other things. Future me won't like that very much. I'll go back and include an git config --global --unset command at the end of the jobs to account for that. I'll update my first comment above too. Thanks again for your comment - always something new to learn :)

PoulNielsen commented 1 year ago

We do this in a bash task during build (where we have 'Allow script to access oauth token' enable on the phase run on agent, which 'Enables scripts and other processes launched by tasks to access the OAuth Token through the System.AccessToken variable') So on a single line we do:

GIT_CONFIG_COUNT=1 GIT_CONFIG_KEY_0="http.extraHeader" GIT_CONFIG_VALUE_0="Authorization: Bearer $SYSTEM_ACCESSTOKEN" composer update

Or we could do the same with 'git' (which also works..) GIT_CONFIG_COUNT=1 GIT_CONFIG_KEY_0="http.extraHeader" GIT_CONFIG_VALUE_0="Authorization: Bearer $SYSTEM_ACCESSTOKEN" git clone XXXX

Token is not 'unfolded' or shown in any files this way. (From https://stackoverflow.com/questions/11262010/shell-variable-expansion-in-git-config)