os-climate / physrisk

Physical climate risk calculation engine
Apache License 2.0
29 stars 39 forks source link

Establish pipelines & processes (Dev, QA, and Prod) #19

Closed HeatherAck closed 1 year ago

joemoorhouse commented 2 years ago

I will need a bit of guidance, but have started adding the CI/CD pipeline. First priority is to have alpha packages for physrisk somewhere these can be used in physrisk-api, the Flask service that @Floflow has started work on. @dbferri, I know you are interested in this topic too. We are now creating a CI package on each push. This is going to Test PyPI at the moment (which may actually be an appropriate long-term home for CI packages).

We don't intend to have any packages in PyPI until we are public. https://test.pypi.org/project/physrisk Currently CI package version number is based on version stored in repo with the GitHub run_id as a suffix value=cat VERSION echo "$value-dev${{ github.run_id }}" > VERSION e.g. "physrisk 0.0.1.dev1695407271" There are many ways of forming a CI build version of course and I have no strong preference, so any steer on a standard approach appreciated! (@MichaelTiemannOSC, @HeatherAck) Intent is that CI build is used in phyrisk-api, running (integration) tests against package automatically.

joemoorhouse commented 2 years ago

And I guess in GitHub Actions world we might do something like this to trigger integration tests?: https://github.community/t/triggering-by-other-repository/16163 Not sure if any better ways?

MichaelTiemannOSC commented 2 years ago

Copying @erikerlandson, @MichaelClifford, and @durandom for guidance.

DavideFerri commented 2 years ago

Hey @joemoorhouse, not sure whether you have made any progress on this, but I'll give my two cents. This contains a good example of how to build a CI/CD pipeline on GitHub Actions: https://medium.com/@michaelekpang/creating-a-ci-cd-pipeline-using-github-actions-b65bb248edfe . From what I can see, this is not very different from the Azure Pipelines, which I have some experience with.

First, it is important that we are comfortable with the whole battery of unit tests, UAT, integration and regression tests. At which stage of that process do you think we are?

HeatherAck commented 2 years ago

Adding @chauhankaranraj and @oindrillac for input as well. Also, there will be a session on Operate First best practices Wed 26-Jan from 9-10AM. Ava Houck (ahouck@os-climate.org) can add you to invite if you don't have it.

joemoorhouse commented 2 years ago

Hi @negillett, using Actions to build Docker image and publish seems pretty straight forward: https://redhat-cop.github.io/ci/publishing-images.html Although probably we should update to v2 https://github.com/docker/build-push-action Although having said that I am trying with a test Flask repo and my own Quay account and getting some authentication issue on push! Looking into why...

negillett commented 2 years ago

@joemoorhouse, I just realized: physrisk is the library that will be used by the app, right? If so, we don't really want an image of physrisk, just need to release it on PyPI. The physrisk-api should run on OpenShift as a containerized app using the physrisk library it imports from PyPI. Or am I missing something?

joemoorhouse commented 2 years ago

@negillett, yes that's all correct. For physrisk we just need package in PyPI. We need images for physrisk-api (service Flask + nginx) and physrisk-ui (React) only. And indeed physrisk-api imports from PyPI.

negillett commented 2 years ago

Ah, okay. I was just confused because this issue is in physrisk. Should I assume all issues here are for the entire project going forward?

MichaelTiemannOSC commented 2 years ago

Only when you identify a project-wide issue ;-)

You should solve this problem for PhysRisk and we should create a tracking issue for each of the workstreams to do what they need to do to update their templates. I have many repos to update because I've forked the template a dozen times for the dozen pipelines I've written so far. So thanks for that ;-)

joemoorhouse commented 2 years ago

Yes indeed, going forwards we will have issues per repo. Partly because the other repos are quite new!

negillett commented 2 years ago

So for physrisk builds/releases, I've seen automation wherein you'd just push a commit like "Release v1.2.0" which triggers a GitHub release to PyPI.

We'll probably need something like this once live rather than releasing on every merge as is done currently.

I can can look into and implement this if we want.

For physrisk-api, building and pushing an image to quay from a GitHub action will work and seems fine. Alternative is to hook quay into the repo and let it do the build for us upon merge.

Once an image is in quay, we can deploy (method undetermined?) it to an OpenShift dev and/or qa environment/namespace and some functional testing can happen. The process for said testing is another question (automated somehow, manual, etc.).

We'll need to work out a system for promoting images to the next environment/namespace too.

For another project we simply manually bump the image tag from "qa" to "stage" once functional testing is done for the new feature/fix, then "stage" to "prod" when we want to release. Moving the tag makes the new image available to the Ansible Tower deployment job (env specific) the next time it's run (also manually, usually immediately after tagging). The playbook run by the job takes care of the OpenShift deployment.

HeatherAck commented 2 years ago

@negillett - you're awesome, just sayin' - automation that pushes a release-related commit will definitely be needed to scale. I created similar issue as this one for some of the other workstreams' (e.g. Data Extraction/NLP) in their projects, but not necessarily tied to appropriate repo - sorry for the confusion. I'll create a tracking issue for the workstreams to update their templates.

joemoorhouse commented 2 years ago

Seconded: many thanks @negillett!! On your specific points:

Yes, I was also thinking we would want to push a "Release v1.2.0" commit, or tagged commit to do a Release. We package only a dev CI/CD build for all pushes/merges for use in physrisk-api. So yes please go ahead and implement - would be great!

I somehow prefer building using Actions and pushing to Quay as I think nice to have all building in Actions. This may be a bad reason, so open to counter-argument :)

I think next priority is to get pushing to Quay for physrisk-api and physrisk-ui (then we can even do a one-off manual deploy to sandbox within OpenShift which is a nice milestone).

As you say we want to automate deploy to OpenShift. We may want to do deploy to sandbox (or separate dev env) continuously with view to automating functional testing as you suggest. One click deployment also great as first step. I have not looked at any workflow yet around how to do that.

Image promotion and the scheme you suggest for that seems very sensible to me.

I think we want to get images to Quay for physrisk-api next. @Floflow is putting in a Dockerfile I think and I will add a workflow later today. We'll then need some credentials added as secrets I guess...

negillett commented 2 years ago

Thanks, @HeatherAck, @joemoorhouse. 😅

Re Quay: I think the process outlined at https://redhat-cop.github.io/ci/publishing-images.html is good. I don't see a clear advantage to either method.

Agreed that we should first get an image onto quay (via Workflow) and do an initial deployment to OCP.

I like the idea of continuous deployment as well, and "dev" seems an appropriate env for that. We'll have to figure out how to automate more testing at that point when we get there, I think.

Do we have a certain automation platform in mind to bridge Quay to OCP? Ansible seems to make the most sense given the stack and open-source orientation.

erikerlandson commented 2 years ago

Do we have a certain automation platform in mind to bridge Quay to OCP?

All OS Climate deployments are managed via Operate First (argo-cd): https://github.com/operate-first/apps

That said, it should be reasonable to create some sort of deployment config, via op1st, and that DC is configured to use appropriately promoted images

joemoorhouse commented 2 years ago

Hi @negillett and @Floflow, I added https://github.com/os-climate/physrisk-api/blob/main/.github/workflows/test-build-push.yml based on a simple Dockerfile (@Floflow, just placeholder; feel free to amend!) Set to create new image on push with continuous deployment in mind. That will be contingent on all tests passing but have not added that part yet.

We need some credentials for Quay - failing at that step - presumably added as GitHub secrets? @erikerlandson/@negillett, would you have these and if so please can you add the secrets (if indeed that is right thing to do)?

negillett commented 2 years ago

@joemoorhouse, "Push to registry" step is missing repository attribute, which should be ${{ env.IMAGE_REGISTRY }}/${{ env.APP_NAME }}. And I created the physrisk-api repo to push to. The registry username should be "os-climate", I believe, but someone else will need to add the registry password to the GitHub repo's secrets.

Floflow commented 2 years ago

Hi @joemoorhouse and @negillett, as Docker image is quite new for me, I tried something but I will need some help and guidance here ! I've created 2 Docker files on the physrisk-api repo, one for the app and one for Nginx (with new Nginx configuration files).I've created a docker-compose.yml to define the services but I've read a lot of things about it and I'm really not sure is the right way to do it. Also I (successfully) received an error message after committing as I don't push anything to Quay. Maybe a devfile.yaml is what we expect here ? (As in your example @joemoorhouse ) Any help here would be more than welcome ! Thanks :)

joemoorhouse commented 2 years ago

Thanks @negillett and @Floflow.

Hi @MichaelTiemannOSC / @erikerlandson: would you be able to add the Quay registry password please - assuming you are happy with approach! And anything you need us to do first (e.g. tagging), perhaps to control proliferation of continuous build images.

erikerlandson commented 2 years ago

@Floflow @joemoorhouse @negillett I have created two new github action secrets: OSC_QUAY_ROBOT_USER OSC_QUAY_ROBOT_TOKEN

These should be usable from github actions to authenticate to quay via podman, docker, etc

https://docs.github.com/en/actions/security-guides/encrypted-secrets#using-encrypted-secrets-in-a-workflow

joemoorhouse commented 2 years ago

Thanks @erikerlandson,

I (and Actions in physrisk-ui) can't see those. Are those organization secrets? If so, we might have a problem :):

'Organization secrets can only be used by public repositories on your plan. If you would like to use organization secrets in a private repository, you will need to upgrade your plan.'

joemoorhouse commented 2 years ago

Sorry, Actions in physrisk-api I mean. Might have to duplicate as repo secrets until we can make public?

erikerlandson commented 2 years ago

@joemoorhouse I created some tokens on your private repo: OSC_PHYSRISK_QUAY_USER OSC_PHYSRISK_QUAY_TOKEN These should have push privileges to the physrisk-api image on quay

joemoorhouse commented 2 years ago

Thanks @erikerlandson. That all works now!

Please can you add those to physrisk-ui also? (those are only two I promise).

Next to see if we can deploy our images...

joemoorhouse commented 2 years ago

OK, so next one (sorry!) please can we have the Quay image pull secret added to OpenShift (sandbox)?

HeatherAck commented 2 years ago

I looked at the pricing - the cost is nominal, so will build it into the budget if LF doesn't support.

Heather Ackenhusen Principal Technical Program Manager +1 847-687-6023 (mobile/text msg/whatsapp) OS-Climate (Linux Foundation Contractor)


From: Erik Erlandson @.***> Sent: Thursday, February 3, 2022 5:30 AM To: os-climate/physrisk Cc: Heather Ackenhusen; Mention Subject: Re: [os-climate/physrisk] Establish pipelines & processes (Dev, QA, and Prod) (Issue #19)

I should be able to configure a quay password into os-climate github secrets. I'd love to create an actual os-climate quay account but quay accounts are now all tied to redhat user logins. @HeatherAckhttps://url.emailprotection.link/?buEq4eyNl9NRYRhKUfm4VcFm2Q82fl4BkiAxWGMt6vDyWW5sD-LEtfH3Ii3UWVWplwfU8LCkJao6gvwQgjqhytm9cythOiMKp8BUm6w5vyGLLZ4jUVxtMxhugL0EERXrF maybe it is possible to get a paid quay acct via linuxfoundation, but I do not know.

— Reply to this email directly, view it on GitHubhttps://url.emailprotection.link/?bFoJ49KIp7_OR6Z4tRe3faPeBQ_Il9ZAjo8pwHQVuWNwqGKy1pZpNs_HZiR2oZyW5PJM_4-VnhinbxVtrupb96u7rP8pkOcU0G0i1TztpcppprXmwAzMuc52PYqG3tJTw, or unsubscribehttps://url.emailprotection.link/?bR_rthnMQYLi289eFxH52B7BNbvgTrJuU4oRAsa0isEZgOClvSCll-b_mbz4TaR7rxdH_ljgZ0LafZWEC83tC8v6dflgNlfzi9k1WTN3YFOPkDII0i0houGiVAsPgdyCs2vSyz44UqEeRm35t55ST5FcIAVAOxlAAUIzi65FkJJY~. Triage notifications on the go with GitHub Mobile for iOShttps://url.emailprotection.link/?baeC4nVGI3gtQjEFG8AaYmRSPZZ33HB1Gu9EN1hgFLtygZMnLIpSTiq2o0brEVvP3mmCnCvAU9wgRg0Z9dT20OQiGhVpbvmomnooa-ATNe247cW75TsplY066ZuTF5vYF6KrqFSiWfG0Qj1X2tBy_FC5XLkk68ArAhsPr4gRsuGA~ or Androidhttps://url.emailprotection.link/?bACQAE2dkzMNZSraWKSNZeueg3_0jrqaJYiB50rO1gTuiJyiMVoyNKdUBExxKIlHlg8MkF1L6Ko1jmV3DeBVh88OGweuGi-wo5eG3kfAruYqVi2ywx0D6R6C %20XoBdCeyLtHm89dMo_LpcU8G0bDrFHkfbch15RA5JVy7aBxuMhDY60nh6DlwXGCB01Rz_m3Y38IVM9nv4t6IsPCpksjI1B3w~~. You are receiving this because you were mentioned.Message ID: @.***>

erikerlandson commented 2 years ago

@joemoorhouse @negillett I sent you a console link to an image pull secret over email

negillett commented 2 years ago

@HeatherAck, was that for Quay?

erikerlandson commented 2 years ago

I regret bringing @HeatherAck into this thread - it was based on an incorrect understanding of what I needed for access to quay.

joemoorhouse commented 2 years ago

Thanks @erikerlandson for image pull secret. Am getting some errors trying to deploy on sandbox, but maybe one for office hours tomorrow.

Please can you add OSC_PHYSRISK_QUAY_USER OSC_PHYSRISK_QUAY_TOKEN to physrisk-ui as well? (i.e. Quay tokens needed for physrisk-api and physrisk-ui)

I have just added a Dockerfile to deploy React app.

erikerlandson commented 2 years ago

@joemoorhouse I created: OSC_PHYSRISK_UI_QUAY_USER OSC_PHYSRISK_UI_QUAY_TOKEN

I also generated a new quay image: physrisk-ui - the github action and openshift pull secret should work for physrisk-api and physrisk-ui images now

joemoorhouse commented 2 years ago

Thanks @erikerlandson. Image building and push looks all good now.

OpenShift image deploy not working yet (ImagePullBackOff)... for tomorrow!

erikerlandson commented 2 years ago

I think I created the pull secret wrong, I re-created it, we can see if it works properly tomorrow :+1:

negillett commented 2 years ago

@joemoorhouse, I think I found an answer for you: https://stackoverflow.com/questions/54360223/openshift-nginx-permission-problem-nginx-emerg-mkdir-var-cache-nginx-cli Though now I can't find the pod with that nginx error.

joemoorhouse commented 2 years ago

Thanks @negillett! Yes, having some success replicating and fixing locally using that approach.

The @erikerlandson tip to add USER 9999:0 is a good one. https://github.com/os-climate/os_c_data_commons/blob/main/docs/building-images-for-openshift.md

Image tile download seems to have got broken too however... not sure why.

joemoorhouse commented 2 years ago

Hi @negillett and @erikerlandson, Managed to get the app running (yay!): http://physrisk-ui-latest-sandbox.apps.odh-cl1.apps.os-climate.org/ Another issue came up - bind() to 0.0.0.0:80 failed - after which I kind of felt we should be using an nginx image that is adapted for non-root containers rather than creating our own(?). Right now set up to use an nginx image from bitnami who provide such things: https://engineering.bitnami.com/articles/running-non-root-containers-on-openshift.html Although there is maybe a RedHat one that would be better for us? I guess something like registry.access.redhat.com/ubi8/nginx-120 probably has the same goal in mind? I struggled a bit to find a good example to apply to React / blog.

The tile download turned out to be a mapboxgl issue that impacts Prod; have a work-around. Nothing to do with permissions.

negillett commented 2 years ago

@joemoorhouse, can you share the location of that error? For the nginx image, whatever works should be fine but, yeah, ubi8/nginx-120 image is a good bet and is really easily accessible from OpenShift.

joemoorhouse commented 2 years ago

@negillett , you mean the 0.0.0.0:80 error? I'm afraid I did some environment clean-up, so we would have to re-introduce it. I can give you the steps if you want to look into it (or I can recreate?) - but equally I'm happy if someone else has already done the work to get permissions right.

Basically, once the first set of permissions errors were fixed via: RUN chgrp -R 0 /var/cache/nginx /var/run RUN chmod g+rwX /var/cache/nginx /var/run That error then came up.

negillett commented 2 years ago

Ah, okay. So, switching to the 3rd party nginx image resolved the issue, @joemoorhouse?

joemoorhouse commented 2 years ago

@negillett Yes indeed, seems fine for UI now. I'll add an issue to look into migrating to ubi8/nginx-120, but low priority I think. I would say next priority on pipelines and processes is to get physrisk-api container running, i.e. Flask-based service?

joemoorhouse commented 1 year ago

Closed as complete.