ros-industrial / industrial_ci

Easy continuous integration repository for ROS repositories
Apache License 2.0
248 stars 129 forks source link

Trouble cloning multiple private repositories from .repos file #807

Open TrevorGibson-SR opened 1 year ago

TrevorGibson-SR commented 1 year ago

Hi all,

I've been trying to get a GitHub Actions CI build to work with cloning more than one private repository, but have run into multiple issues and can't seem to resolve them.

Overview

The current Action I have set up works fine specifically when only 1 private GitHub repository is listed in our .repos file. It breaks when either a second private GitHub repo is listed (issue 1), or any private BitBucket repositories are listed (issue 2).

Some additional information:

Here is a snippet of the .yml action showing the steps being run (placeholders in place of actual repo URLs):

reusable_industrial_ci_with_cache:
    name: ${{ inputs.ros_distro }} ${{ inputs.ros_repo }} ${{ inputs.os_code_name }}
    runs-on: ubuntu-latest
    env:
      CCACHE_DIR: ${{ github.workspace }}/${{ inputs.ccache_dir }}
      BASEDIR: ${{ github.workspace }}/${{ inputs.basedir }}
      CACHE_PREFIX: ${{ inputs.ros_distro }}-${{ inputs.os_code_name }}-${{ inputs.ros_repo }}-${{ github.job }}
    strategy:
      fail-fast: false
    steps:
      - name: Checkout ${{ inputs.ref }} when build is not scheduled
        if: ${{ github.event_name != 'schedule' }}
        uses: actions/checkout@v3
      - name: Checkout ${{ inputs.ref }} on scheduled build
        if: ${{ github.event_name == 'schedule' }}
        uses: actions/checkout@v3
        with:
          ref: ${{ inputs.ref_for_scheduled_build }}
      - name: Setup SSH keys for private repositories 1
        uses: webfactory/ssh-agent@v0.7.0
        with:
          ssh-private-key: |
            ${{ secrets.KEY1 }}
            ${{ secrets.KEY2 }}
            ${{ secrets.KEY3 }}
      - name: Prepare git and ssh config for build context
        run: |
          mkdir root-config
          cp -r ~/.gitconfig ~/.ssh root-config/
      - name: cache target_ws
        if: ${{ ! matrix.env.CCOV }}
        uses: pat-s/always-upload-cache@v2.1.5
        with:
          path: ${{ env.BASEDIR }}/target_ws
          key: target_ws-${{ env.CACHE_PREFIX }}-${{ hashFiles('**/CMakeLists.txt', '**/package.xml') }}-${{ github.run_id }}
          restore-keys: |
            target_ws-${{ env.CACHE_PREFIX }}-${{ hashFiles('**/CMakeLists.txt', '**/package.xml') }}
      - name: cache ccache
        uses: pat-s/always-upload-cache@v2.1.5
        with:
          path: ${{ env.CCACHE_DIR }}
          key: ccache-${{ env.CACHE_PREFIX }}-${{ github.sha }}-${{ github.run_id }}
          restore-keys: |
            ccache-${{ env.CACHE_PREFIX }}-${{ github.sha }}
            ccache-${{ env.CACHE_PREFIX }}
      - uses: 'ros-industrial/industrial_ci@master'
        env:
          UPSTREAM_WORKSPACE: ${{ inputs.upstream_workspace }}
          ROS_DISTRO: ${{ inputs.ros_distro }}
          ROS_REPO: ${{ inputs.ros_repo }}
          OS_CODE_NAME: ${{ inputs.os_code_name }}
          BEFORE_INSTALL_UPSTREAM_DEPENDENCIES: ${{ inputs.before_install_upstream_dependencies }}
      - name: prepare target_ws for cache
        if: ${{ always() && ! matrix.env.CCOV }}
        run: |
          du -sh ${{ env.BASEDIR }}/target_ws
          sudo find ${{ env.BASEDIR }}/target_ws -wholename '*/test_results/*' -delete
          sudo rm -rf ${{ env.BASEDIR }}/target_ws/src
          du -sh ${{ env.BASEDIR }}/target_ws

KEY1, KEY2, and KEY3 are action secrets defined in the main repository, and contain the private keys for the corresponding private repositories we want to clone.

Issue 1: Action fails when cloning a second private GitHub repository

Prior to running the industrial CI step, I have an ssh-agent step to load the SSH keys and allow for cloning private repositories. The output from this step indicates that the keys are being scanned and added correctly. Additionally, I have added the repository links for the GitHub repos as comments for the keys to use the deploy key mapping feature of the ssh-agent action (actual keys/hashes redacted for security):

Adding GitHub.com keys to /home/runner/.ssh/known_hosts
Starting ssh-agent
SSH_AUTH_SOCK=/tmp/ssh-XXXXXX
SSH_AGENT_PID=1737
Adding private key(s) to agent
Identity added: (stdin) (<repo1>)
Identity added: (stdin) (<repo2>)
Identity added: (stdin) (<repo3>)
Key(s) added:
256 SHA256:<key1 hash> <repo1> (ED25519)
256 SHA256:<key2 hash> <repo2> (ED25519)
256 SHA256:<key3 hash> <repo3> (ED25519)
Configuring deployment key(s)
Added deploy-key mapping: Use identity '/home/runner/.ssh/<key1>' for GitHub repository <repo1>
Added deploy-key mapping: Use identity '/home/runner/.ssh/<key2>' for GitHub repository <repo2>
Comment for (public) key '<key3>' does not match GitHub URL pattern. Not treating it as a GitHub deploy key.

However, during the cloning process in the industrial CI step, only the repository for the first key listed is successfully cloned. If I switch the key order in the .yml action file, the successfully clone repository changes as well. Failure output:

setup_upstream_workspace

  $ sudo apt-get -qq install -y --no-upgrade --no-install-recommends python3-vcstool | grep -E 'Setting up' 
  Setting up python3-vcstool (0.3.0-1) ...

  $ sudo apt-get -qq install -y --no-upgrade --no-install-recommends git-core | grep -E 'Setting up' 
  Setting up libcurl3-gnutls:amd64 (7.81.0-1ubuntu1.7) ...
  Setting up liberror-perl (0.17029-1) ...
  Setting up git-man (1:2.34.1-1ubuntu1.5) ...
  Setting up git (1:2.34.1-1ubuntu1.5) ...

  $ sudo apt-get -qq install -y --no-upgrade --no-install-recommends ssh-client | grep -E 'Setting up' 
  Setting up libcbor0.8:amd64 (0.8.0-2ubuntu1) ...
  Setting up libmd0:amd64 (1.0.4-1build1) ...
  Setting up libfido2-1:amd64 (1.10.0-1) ...
  Setting up libbsd0:amd64 (0.11.5-1) ...
  Setting up libedit2:amd64 (3.1-20210910-1build1) ...
  Setting up openssh-client (1:8.9p1-3) ...
  ....E.

... <other public repositories cloned successfully here> ...

  === <path to repo2> (git) ===
  Could not determine ref type of version: ERROR: Repository not found.
  fatal: Could not read from remote repository.

  Please make sure you have the correct access rights
  and the repository exists.

Issue 2: Action hangs when cloning any number of private BitBucket repositories

Similar to above, the ssh-agent step successfully loads the SSH keys prior to running the industrial CI step. When a private BitBucket repo is listed in the .repos file, the action hangs during the colcon_setup portion and must be manually cancelled. It always hangs at the same point, prior to even attempting to clone the repos in the .repos file. Output log:

colcon_setup

  $ sudo apt-get -qq install -y --no-upgrade --no-install-recommends python3-colcon-common-extensions | grep -E 'Setting up' 
  Setting up python3-pkg-resources (59.6.0-1.2) ...
  Setting up python3-more-itertools (8.10.0-2) ...
  Setting up python3-iniconfig (1.1.1-2) ...
  Setting up python3-attr (21.2.0-1) ...
  Setting up libpsl5:amd64 (0.21.0-1.2build2) ...
  Setting up python3-py (1.10.0-1) ...
  Setting up libyaml-0-2:amd64 (0.2.2-1build2) ...
  Setting up libglib2.0-0:amd64 (2.72.4-0ubuntu1) ...
  Setting up libbrotli1:amd64 (1.0.9-2build6) ...
  Setting up libnghttp2-14:amd64 (1.43.0-1build3) ...
  Setting up python3-yaml (5.4.1-1ubuntu1) ...
  Setting up python3-distlib (0.3.4-1) ...
  Setting up python3-zipp (1.0.0-3) ...
  Setting up python3-empy (3.3.4-2) ...
  Setting up tzdata (2022g-0ubuntu0.22.04.1) ...
  Setting up python3-six (1.16.0-3ubuntu1) ...
  Setting up python3-roman (3.3-1) ...
  Setting up libuv1:amd64 (1.43.0-1) ...
  Setting up emacsen-common (3.0.4) ...
  Setting up python3-pyparsing (2.4.7-1) ...
  Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2build4) ...
  Setting up dh-elpa-helper (2.0.9ubuntu1) ...
  Setting up libdbus-1-3:amd64 (1.12.20-2ubuntu4.1) ...
  Setting up libjsoncpp25:amd64 (1.9.5-3) ...
  Setting up python3-toml (0.10.2-1) ...
  Setting up libssh-4:amd64 (0.9.6-2build1) ...
  Setting up librhash0:amd64 (1.4.2-1ubuntu1) ...
  Setting up libcurl4:amd64 (7.81.0-1ubuntu1.7) ...
  Setting up python3-dateutil (2.8.1-6) ...
  Setting up sgml-base (1.30) ...
  Setting up cmake-data (3.22.1-1ubuntu1.22.04.1) ...
  Setting up python3-argcomplete (1.8.1-1.5) ...
  Setting up python3-lib2to3 (3.10.6-1~22.04) ...
  Setting up libicu70:amd64 (70.1-2) ...
  Setting up python3-distutils (3.10.6-1~22.04) ...
  Setting up python3-dbus (1.2.18-3build1) ...
  Setting up python3-importlib-metadata (4.6.4-1) ...
  Setting up python3-setuptools (59.6.0-1.2) ...
  Setting up python3-packaging (21.3-1) ...
  Setting up python3-pluggy (0.13.0-7.1) ...
  Setting up python3-notify2 (0.3-4) ...
  Setting up xml-core (0.18+nmu1) ...
  Setting up libxml2:amd64 (2.9.13+dfsg-1ubuntu0.2) ...
  Setting up libarchive13:amd64 (3.6.0-1ubuntu1) ...
  Setting up python3-pytest (6.2.5-1ubuntu2) ...
  Setting up python3-colcon-core (0.11.0-1) ...
  Setting up python3-colcon-notification (0.2.14-1) ...
  Setting up python3-colcon-pkg-config (0.1.0-1) ...
  Setting up python3-colcon-zsh (0.4.0-1) ...
  Setting up python3-colcon-library-path (0.2.1-1) ...
  Setting up cmake (3.22.1-1ubuntu1.22.04.1) ...
  Setting up python3-colcon-metadata (0.2.5-1) ...
  Setting up python3-colcon-python-setup-py (0.2.7-2) ...
  Setting up python3-colcon-package-information (0.3.3-1) ...
  Setting up python3-colcon-output (0.2.12-2) ...
  Setting up python3-colcon-package-selection (0.2.10-2) ...
  Setting up python3-colcon-defaults (0.2.6-1) ...

Summary

Any help with either of these issues would be greatly appreciated. Ideally I would like to use the BitBucket private repo as it works better with the larger project architecture. However, cloning multiple private GitHub repositories is also a must for another project, so resolving that one alone will be helpful as well.

Thanks in advance

TrevorGibson-SR commented 1 year ago

Update for Issue 2

I was able to solve the issue of cloning a single non-GitHub private repo in the Docker container by adding that repository's server to the known_hosts file. I believe this is what was causing the build process to hang as it was waiting for user input to add the server to the list of known hosts.

This was fixed by issuing the following command immediately after the ssh-agent action in my GitHub workflow:

ssh-keyscan -H bitbucket.org >> ~/.ssh/known_hosts

After investigating the contents of the known_hosts file, it appears that github.com had already been added (either by the workflow managing agent, or by the ssh-agent action), which is why there weren't any issues cloning GitHub repositories.

This solution should work for repositories hosted on any non-GitHub servers... just replace bitbucket.org with the repo server's domain.

TrevorGibson-SR commented 1 year ago

Update for Issue 1

The root cause is due to the configuration created by webfactory/ssh-agent with the deploy key mapping feature. The custom .gitconfig settings are not copied into the docker automatically, therefore not allowing more than one private GitHub repo to be cloned.

This can be resolved by passing both the ~/.gitconfig file and ~/.ssh directory as mapped locations into the docker container. For an immediate fix, use the DOCKER_RUN_OPTS parameter to specify the locations to map in your Github Actions workflow yaml file.

A more permanent fix has been submitted as a PR to incorporate mapping these locations as part of the standard codebase.

AndyZe commented 11 months ago

I'm having some trouble with this as well, but I haven't caught up with @TrevorGibson-SR yet.

Basically I've followed the instructions in index.rst:

In "Add a variable" section, fill in the following text field/area.

    Key: SSH_PRIVATE_KEY

Except I have 4 private repositories, so I figured it might work to add 4 such private keys (with the corresponding public keys in the corresponding 4 private repos).

SSH_PRIVATE_KEY_1, SSH_PRIVATE_KEY_2, ..., SSH_PRIVATE_KEY_4

But, that doesn't work. industrial_ci hangs forever when running in Github. Any simple solution to this?

mathias-luedtke commented 11 months ago

@AndyZe: Just add something like this to your before_script in .gitlab-ci.yml

https://github.com/ros-industrial/industrial_ci/blob/3ed9846c96ed1e0bb36193e8e250632eaac980d0/gitlab.sh#L36-L44

mathias-luedtke commented 11 months ago

Just saw that you are talking about Github as well.. The SSH auto-setup is only implemented for Gitlab. However, a similar trick should work with Github. As far as I can tell, that's whatwebfactory/ssh-agent is doing under the hood.

As @TrevorGibson-SR already pointed out: it is crucial to setup the known hosts properly.

mathias-luedtke commented 11 months ago

Multiple deploy keys should work withwebfactory/ssh-agentand #844