nikosbosse / Metaculus-Twitter-bot

17 stars 4 forks source link

Create a github action to redeploy the bot after a commit to master #11

Open nikosbosse opened 2 years ago

nikosbosse commented 2 years ago

The deploy.sh script needs to be triggered and the credentials stored safely

seabbs commented 2 years ago

Could also just switch the bot to run using GitHub actions only and not use Google cloud functions.

nikosbosse commented 2 years ago

Ha very good point. It seems like gh action quotas are more generous than google cloud functions. Now I just need to learn how to create gh action scripts...

seabbs commented 2 years ago

I usually just role my face on the keyboard until they do what I want.

nikosbosse commented 2 years ago

I usually just role my face on the keyboard until they do what I want.

--> me, programming...

dbaynard commented 2 years ago

Hello, I’m happy to help out with a migration to github actions. I’ve moved AWS deployments (including a python lambda) to github actions before, but not GCP. Github actions can be a little flaky, so you might want to stick with GCP for the actual bot.

If you just want to handle code changes (not dependencies) then something like this should get your started. You’ll need the add the secrets GCP_workload_identity_provider and GCP_service_account (or use one of the other authentication methods). See google-github-actions/setup-gcloud: A GitHub Action for configuring the Google Cloud SDK. The Google Cloud SDK includes both the gcloud and gsutil binaries.

Edit: this file would be something like .github/workflows/deploy.yml

name: Deploy GCP Function
on:
  push:
    branches:
    - master

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  # Cancel ongoing build when redeploying
  cancel-in-progress: true

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest

    # Enable write permission on id-token — I'd have to dig in to the
    # auth action code to work out why this is needed.
    permissions:
      contents: 'read'
      id-token: 'write'

    steps:
      - name: Checkout
        uses: actions/checkout@v2
        with:
          persist-credentials: false

      # https://github.com/google-github-actions/auth
      - id: auth
        name: Authenticate to Google Cloud
        uses: google-github-actions/auth@d16fd896f76605863c491de993d3e5d5cf4f68f8
        with:
          workload_identity_provider: ${{ secrets.GCP_workload_identity_provider }}
          service_account: ${{ secrets.GCP_service_account }}

      # https://github.com/google-github-actions/setup-gcloud
      - name: Set up Cloud SDK
        uses: google-github-actions/setup-gcloud@1b47649e585c2bc0808838cec476c8ab6d84f963

      - name: Update google cloud function
        run: |
          bash "${GITHUB_WORKSPACE}/deploy.sh"
nikosbosse commented 2 years ago

@dbaynard that is amazing, thank you very much! And much more elegant than rolling one's face on the keyboard :) I'll test it as soon as I can. What are your experiences with gh actions and reliability (and maybe what could be done to mitigate that)?

dbaynard commented 2 years ago

You're welcome; I saw something on twitter and subscribed to this issue, so I could see how you did it/offer help, if it became a priority!

I'd say that overall, GA is a great product. I like the simplicity of running shell scripts, using caching, interacting with github itself (actions/github-script: Write workflows scripting the GitHub API in JavaScript) and (of course) the free limits. I happen to use the nix language/build system a lot, and GA plays very nicely with that.

The flakiness I’ve experienced is delays of up to 2 minutes or so in launching CI tasks; mostly they launch within about 10 seconds. I’d expect to run into this sort of flakiness a few days per year; I can't give any more precision than that… but the best mitigation is to use it only for CI/CD and not for running the actual service. That said… if you're happy with these caveats, and it isn't against the TOS, go ahead and use GA for the bot.

There are important security implications for github actions — if you don't audit the actions you use, or trust the organizations who create them, they can steal credentials — but this is true for all CI/CD. As a result, unless I trust the organization who creates the action, I'll always refer to it with a hash (as I've done, above).

More broadly, I’ve found GA to be very flexible, and have good (though not excellent) caching behaviour. Where I've implemented GA for an organization, it's been very useful to have workflows that can run in parallel.


I’m not sure how it works for GCP functions but for AWS lambda the dependencies need to be bundled and shipped as a zip file, and this is fairly straightforward. I've used poetry to manage python dependencies but that isn't necessary in an action itself.

    - name: Install dependencies to zip file
      run: |
        python -m pip install --target ./python \
          -r "${function?}/requirements.txt"
        zip -r "${function?}_deps.zip" ./python

Incidentally I fixed a typo, above, in the auth action.

nikosbosse commented 2 years ago

@dbaynard Thank you very much, this is really cool. I unfortunately tried and miserably failed trying to get the needed secrets and seting up the openID connect... :see_no_evil: But I think the 2 minutes delay wouldn't be an issue at all, so switching completely to gh actions might actually be a good idea in the future

dbaynard commented 2 years ago

That makes sense, although there should be a simpler way than oidc.

One thing to note: I'd forgotten this but I've occasionally seen more than one run of an action, so you might encounter duplicate runs. I haven't run enough actions since GA added the concurrency limit option, though — that should stop it.

https://github.com/nikosbosse/Metaculus-Twitter-bot/blob/d7ccb480c439292b6518dc10330cbb6f58d50195/schedule.sh#L5

You can use such cron expressions in the on object of the GA configuration (I haven't tried this).

Beyond that I guess you'd want to configure GA secrets rather than using a .env, and then use the actions/cache action to cache/restore the python venv (I'd also install poetry but I dislike pip so 🤷 ). You can access main.py using the GITHUB_WORKSPACE path prefix.

There are security implications to giving Twitter credentials to GA, too, though they probably won't cause you any trouble.

Good luck!