pdm-project / pdm

A modern Python package and dependency manager supporting the latest PEP standards
https://pdm-project.org
MIT License
7.9k stars 395 forks source link

Is it feasible to add full service account integration / validation to Google Cloud Platform Artifact Registry #3134

Closed lazarillo closed 1 month ago

lazarillo commented 1 month ago

Is your feature/enhancement proposal related to a problem? Please describe.

The problem is that I want to provide a high level of security for our private package repository while avoiding the challenges in managing the security.

Currently, I am using PDM for my entire Python stack. This can be done by using a simple pip.conf or .pypirc file without adding any security credentials, if the protocol is integrated:

pip.conf:

[global]
extra-index-url = https://<my-location>-python.pkg.dev/<project>/<repository>/simple/

GCP is already integrated with some of the more common Python tools, like pip and python -m build, so that I can provide a pip.conf file as easy as what is shown above, without any credential details.

However, since PDM is not integrated yet, I need to (a) create a JSON key associated with my credentials and (b) add something like the following to the pyproject.toml if I want to be able to run pdm build:

[[tool.pdm.source]]
name = "my_private_repo"
url = "https://$_json_key_base64:${PYTHON_REPO_PASSWORD}@<my-location>-python.pkg.dev/<project>/<repository>/simple/"

where ${PYTHON_REPO_PASSWORD} is the JSON token that I have created associated with the service account, passed as a Github Secret.

The extra work of creating the token and embedding it this way isn't the problem. It's the key rotation. We now have a security maintenance burden: rotating keys, ensuring that we delete the old ones and generate new ones with a decently fast frequency.

Describe the solution you'd like

I would like to allow for integrated credentialing, so that GCP is integrated with the pdm tool in a similar way to how it is integrated with python -m build, so that the pyproject.toml would look like:

[[tool.pdm.source]]
name = "my_private_repo"
url = "https://<my-location>-python.pkg.dev/<project>/<repository>/simple/"

and no keys would need to be associated with the service accounts.

Assistance

I am not sure what is needed, nor even if this is possible within pdm. Maybe it can only be done from within GCP's code.

But if it can be done within PDM, I'm happy to help. Some guidance or a comparison to how it was done in a similar case would be helpful, but I am obviously motivated to assist.

frostming commented 1 month ago

GCP is already integrated with some of the more common Python tools, like pip and python -m build, so that I can provide a pip.conf file as easy as what is shown above, without any credential details.

I don't understand how it is doing so you don't have to provide credentials in the URL. I didn't find any code related to GCP in those tools. Can you point me to some resources?

lazarillo commented 1 month ago

Yes, here are instructions for artifact registry credentials.

But ironically, it's not giving the best, most secure path. And it's not the way we do it. We use workload identity providers which can securely impersonate the service account. Here is some documentation on what to do where and when.

I would also imagine / propose that the way we do it is likely a common path for securely using GCP. To my knowledge, it would not work for multi-cloud, but it works within the same system. We use Github Actions for some of our build pipelines, and Google Cloud Build for others. In the Google Cloud Build, no identity federation is needed, but it is for Github Actions. However, we never try to use the GCP artifact registry to install onto AWS AKS pods, for instance.

My solution (using Github Actions) is as follows when using pdm (I'm only keeping relevant parts of the GHA workflow, and I'm removing anything private for us):

jobs:
  build-and-test-py-pkg:
    name: build & test '${{ inputs.artifact }}' python package
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ${{ inputs.working-directory }}
    permissions:
      contents: 'read'
      id-token: 'write'
    steps:
      - uses: actions/checkout@v4
      - uses: pdm-project/setup-pdm@v4
        name: setup PDM using python version ${{ inputs.python-version }}
        with:
          python-version: ${{ inputs.python-version }}
      - id: 'auth' # THIS IS THE INTERESTING PART FOR GETTING PROPER ACCESS!!
        name: 'Authenticate to Google Cloud'
        uses: 'google-github-actions/auth@v2'
        with:
          token_format: 'access_token'
          workload_identity_provider: ${{ secrets.workload_identity_provider }}
          service_account: ${{ secrets.service_account }}
      - name: install '${{ inputs.artifact }}' dependencies
        ### THE REPO PASSWORD IS THE SERVICE ACCOUNT KEY I NEED TO ROTATE
        ### THE pyproject.toml FILE REFERENCES IT IN THE [[tool.pdm.source]] URL
        env:
          PYTHON_REPO_PASSWORD: ${{ secrets.python_repo_password }}
        run: |
          if [[ ${{ inputs.verbose }} ]]
          then
            pdm install --no-default --no-self --verbose
          else
            pdm install --no-default --no-self
          fi
      - name: run tests on '${{ inputs.artifact }}' (using nox)
        if: ${{ inputs.run-tests }}
        ### THE KEY IS NEEDED HERE, TOO, BECAUSE THE noxfile RUNS pdm sync
        env:
          PYTHON_REPO_PASSWORD: ${{ secrets.python_repo_password }}
        run: |
          if [[ ${{ inputs.verbose }} ]]
          then
            pdm run nox -f ${{ inputs.noxfile }} --error-on-missing-interpreters --non-interactive --add-timestamp --verbose
          else
            pdm run nox -f ${{ inputs.noxfile }} --error-on-missing-interpreters --non-interactive --add-timestamp
          fi
      - name: build '${{ inputs.artifact }}' package
        env:
          PYTHON_REPO_PASSWORD: ${{ secrets.python_repo_password }}
        run: |
          if [[ ${{ inputs.verbose }} ]]
          then
            pdm build --verbose
          else
            pdm build
          fi
      - name: persist package (for later upload to artifact registry)
        # This is persisting to GH, not to Google Cloud
        uses: actions/upload-artifact@v4
        with:
          name: ${{ inputs.artifact }}
          path: ${{ inputs.working-directory }}/dist/
          if-no-files-found: ${{ inputs.action-on-upload-fail }}

In the case where I am using a "normal" python -m build, all of the credentials like what PDM suggests here are not needed. The Authenticate to Google Cloud step above is sufficient, so that the credentials can be removed and the pyproject.toml just looks like:

[[tool.pdm.source]]
name = "my_private_repo"
url = "https://<my-location>-python.pkg.dev/<project>/<repository>/simple/"

I hope that helps! Let me know what I can do to help.

lazarillo commented 1 month ago

I was also trying to find a case where I am not using PDM, but still building a project.

It seems I have removed any of them. (I really like PDM.)

So, the best I have for an example that works without using the service account key is when I upload a built package to the artifact registry. Here are the details for that:

jobs:
  publish-py-pkg:
    runs-on: ubuntu-latest
    permissions:
      contents: 'read'
      id-token: 'write'
    steps:
      - uses: actions/checkout@v4
      - name: Set up python ${{ inputs.python-version }}
        uses: actions/setup-python@v5
        with:
          python-version: ${{ inputs.python-version }}
      - name: Install publish dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install twine
          python -m pip install keyrings.google-artifactregistry-auth
      - id: 'auth'
        name: 'Authenticate to Google Cloud'
        ### THIS IS THE STEP IMPERSONATING THE SERVICE ACCOUNT WITH THE PROPER RIGHTS!!
        uses: 'google-github-actions/auth@v2'
        with:
          token_format: 'access_token'
          workload_identity_provider: ${{ secrets.workload_identity_provider }}
          service_account: ${{ secrets.service_account }}
      - name: Download Python package artifact
        uses: actions/download-artifact@v4
        with:
          name: ${{ inputs.artifact }}
          path: ./dist
      - name: Upload to Artifact Registry
        ### THIS IS THE PART THAT NEEDS AUTHORIZATION, BUT WHICH IS NOT USING ANY SERVICE ACCOUNT KEYS
        ### THE COMMAND THAT CREATES THE `.pypirc` FILE CREATES A FILE WITHOUT ANY EMBEDDED CREDENTIALS
        ### AN EXAMPLE OF THE OUTPUT OF THE COMMAND IS BELOW
        run: |
          gcloud config set account ${{ secrets.service_account }} --verbosity=${{ inputs.verbosity }}
          gcloud artifacts print-settings python --project=${{ inputs.project }} --repository=${{ inputs.repository }} --location=${{ inputs.location }} --verbosity=${{ inputs.verbosity }} > ~/.pypirc
          if [[ ${{ inputs.verbosity }} == "debug" ]]
          then
            python -m twine upload --verbose --repository ${{ inputs.repository }} dist/*
          else
            python -m twine upload --repository ${{ inputs.repository }} dist/*
          fi

The generated .pypirc file mentioned above will generate something similar to the following (removing our details):

> gcloud artifacts print-settings python \
                                       --project=<our-project> \
                                       --repository=<our-repo> \
                                       --location=<our-location>

# Insert the following snippet into your .pypirc

[distutils]
index-servers =
    <our-repo>

[<our-repo>]
repository: https://<our-location>-python.pkg.dev/<our-project>/<our-repo>/

# Insert the following snippet into your pip.conf

[global]
extra-index-url = https://<our-location>-python.pkg.dev/<our-project>/<our-repo>/simple/
frostming commented 1 month ago

I am not sure what is needed, nor even if this is possible within pdm. Maybe it can only be done from within GCP's code.

It seems so, neither twine and pip has integration code with GCP.

lazarillo commented 1 month ago

OK. Yes, I feared that was the case. Thank you for the response!

It is a little frustrating that Google integrates only with tools that are highly outdated and makes their own tools so non-standard.

I love PDM and I truly don't understand how anyone who ever works with Python packaging could stick with something like python -m build or any tool without the equivalent to a pdm.lock file. Oh well...