Closed lazarillo closed 1 month ago
GCP is already integrated with some of the more common Python tools, like
pip
andpython -m build
, so that I can provide apip.conf
file as easy as what is shown above, without any credential details.
I don't understand how it is doing so you don't have to provide credentials in the URL. I didn't find any code related to GCP in those tools. Can you point me to some resources?
Yes, here are instructions for artifact registry credentials.
But ironically, it's not giving the best, most secure path. And it's not the way we do it. We use workload identity providers which can securely impersonate the service account. Here is some documentation on what to do where and when.
I would also imagine / propose that the way we do it is likely a common path for securely using GCP. To my knowledge, it would not work for multi-cloud, but it works within the same system. We use Github Actions for some of our build pipelines, and Google Cloud Build for others. In the Google Cloud Build, no identity federation is needed, but it is for Github Actions. However, we never try to use the GCP artifact registry to install onto AWS AKS pods, for instance.
My solution (using Github Actions) is as follows when using pdm
(I'm only keeping relevant parts of the GHA workflow, and I'm removing anything private for us):
jobs:
build-and-test-py-pkg:
name: build & test '${{ inputs.artifact }}' python package
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ inputs.working-directory }}
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: actions/checkout@v4
- uses: pdm-project/setup-pdm@v4
name: setup PDM using python version ${{ inputs.python-version }}
with:
python-version: ${{ inputs.python-version }}
- id: 'auth' # THIS IS THE INTERESTING PART FOR GETTING PROPER ACCESS!!
name: 'Authenticate to Google Cloud'
uses: 'google-github-actions/auth@v2'
with:
token_format: 'access_token'
workload_identity_provider: ${{ secrets.workload_identity_provider }}
service_account: ${{ secrets.service_account }}
- name: install '${{ inputs.artifact }}' dependencies
### THE REPO PASSWORD IS THE SERVICE ACCOUNT KEY I NEED TO ROTATE
### THE pyproject.toml FILE REFERENCES IT IN THE [[tool.pdm.source]] URL
env:
PYTHON_REPO_PASSWORD: ${{ secrets.python_repo_password }}
run: |
if [[ ${{ inputs.verbose }} ]]
then
pdm install --no-default --no-self --verbose
else
pdm install --no-default --no-self
fi
- name: run tests on '${{ inputs.artifact }}' (using nox)
if: ${{ inputs.run-tests }}
### THE KEY IS NEEDED HERE, TOO, BECAUSE THE noxfile RUNS pdm sync
env:
PYTHON_REPO_PASSWORD: ${{ secrets.python_repo_password }}
run: |
if [[ ${{ inputs.verbose }} ]]
then
pdm run nox -f ${{ inputs.noxfile }} --error-on-missing-interpreters --non-interactive --add-timestamp --verbose
else
pdm run nox -f ${{ inputs.noxfile }} --error-on-missing-interpreters --non-interactive --add-timestamp
fi
- name: build '${{ inputs.artifact }}' package
env:
PYTHON_REPO_PASSWORD: ${{ secrets.python_repo_password }}
run: |
if [[ ${{ inputs.verbose }} ]]
then
pdm build --verbose
else
pdm build
fi
- name: persist package (for later upload to artifact registry)
# This is persisting to GH, not to Google Cloud
uses: actions/upload-artifact@v4
with:
name: ${{ inputs.artifact }}
path: ${{ inputs.working-directory }}/dist/
if-no-files-found: ${{ inputs.action-on-upload-fail }}
In the case where I am using a "normal" python -m build
, all of the credentials like what PDM suggests here are not needed. The Authenticate to Google Cloud step above is sufficient, so that the credentials can be removed and the pyproject.toml
just looks like:
[[tool.pdm.source]]
name = "my_private_repo"
url = "https://<my-location>-python.pkg.dev/<project>/<repository>/simple/"
I hope that helps! Let me know what I can do to help.
I was also trying to find a case where I am not using PDM, but still building a project.
It seems I have removed any of them. (I really like PDM.)
So, the best I have for an example that works without using the service account key is when I upload a built package to the artifact registry. Here are the details for that:
jobs:
publish-py-pkg:
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: actions/checkout@v4
- name: Set up python ${{ inputs.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
- name: Install publish dependencies
run: |
python -m pip install --upgrade pip
python -m pip install twine
python -m pip install keyrings.google-artifactregistry-auth
- id: 'auth'
name: 'Authenticate to Google Cloud'
### THIS IS THE STEP IMPERSONATING THE SERVICE ACCOUNT WITH THE PROPER RIGHTS!!
uses: 'google-github-actions/auth@v2'
with:
token_format: 'access_token'
workload_identity_provider: ${{ secrets.workload_identity_provider }}
service_account: ${{ secrets.service_account }}
- name: Download Python package artifact
uses: actions/download-artifact@v4
with:
name: ${{ inputs.artifact }}
path: ./dist
- name: Upload to Artifact Registry
### THIS IS THE PART THAT NEEDS AUTHORIZATION, BUT WHICH IS NOT USING ANY SERVICE ACCOUNT KEYS
### THE COMMAND THAT CREATES THE `.pypirc` FILE CREATES A FILE WITHOUT ANY EMBEDDED CREDENTIALS
### AN EXAMPLE OF THE OUTPUT OF THE COMMAND IS BELOW
run: |
gcloud config set account ${{ secrets.service_account }} --verbosity=${{ inputs.verbosity }}
gcloud artifacts print-settings python --project=${{ inputs.project }} --repository=${{ inputs.repository }} --location=${{ inputs.location }} --verbosity=${{ inputs.verbosity }} > ~/.pypirc
if [[ ${{ inputs.verbosity }} == "debug" ]]
then
python -m twine upload --verbose --repository ${{ inputs.repository }} dist/*
else
python -m twine upload --repository ${{ inputs.repository }} dist/*
fi
The generated .pypirc
file mentioned above will generate something similar to the following (removing our details):
> gcloud artifacts print-settings python \
--project=<our-project> \
--repository=<our-repo> \
--location=<our-location>
# Insert the following snippet into your .pypirc
[distutils]
index-servers =
<our-repo>
[<our-repo>]
repository: https://<our-location>-python.pkg.dev/<our-project>/<our-repo>/
# Insert the following snippet into your pip.conf
[global]
extra-index-url = https://<our-location>-python.pkg.dev/<our-project>/<our-repo>/simple/
I am not sure what is needed, nor even if this is possible within
pdm
. Maybe it can only be done from within GCP's code.
It seems so, neither twine
and pip
has integration code with GCP.
OK. Yes, I feared that was the case. Thank you for the response!
It is a little frustrating that Google integrates only with tools that are highly outdated and makes their own tools so non-standard.
I love PDM and I truly don't understand how anyone who ever works with Python packaging could stick with something like python -m build
or any tool without the equivalent to a pdm.lock
file. Oh well...
Is your feature/enhancement proposal related to a problem? Please describe.
The problem is that I want to provide a high level of security for our private package repository while avoiding the challenges in managing the security.
Currently, I am using PDM for my entire Python stack. This can be done by using a simple
pip.conf
or.pypirc
file without adding any security credentials, if the protocol is integrated:pip.conf
:GCP is already integrated with some of the more common Python tools, like
pip
andpython -m build
, so that I can provide apip.conf
file as easy as what is shown above, without any credential details.However, since PDM is not integrated yet, I need to (a) create a JSON key associated with my credentials and (b) add something like the following to the
pyproject.toml
if I want to be able to runpdm build
:where
${PYTHON_REPO_PASSWORD}
is the JSON token that I have created associated with the service account, passed as a Github Secret.The extra work of creating the token and embedding it this way isn't the problem. It's the key rotation. We now have a security maintenance burden: rotating keys, ensuring that we delete the old ones and generate new ones with a decently fast frequency.
Describe the solution you'd like
I would like to allow for integrated credentialing, so that GCP is integrated with the
pdm
tool in a similar way to how it is integrated withpython -m build
, so that thepyproject.toml
would look like:and no keys would need to be associated with the service accounts.
Assistance
I am not sure what is needed, nor even if this is possible within
pdm
. Maybe it can only be done from within GCP's code.But if it can be done within PDM, I'm happy to help. Some guidance or a comparison to how it was done in a similar case would be helpful, but I am obviously motivated to assist.