ipfs / infra

Tools and systems for the IPFS community
MIT License
132 stars 41 forks source link

CI for js-ipfs #442

Closed eefahy closed 5 years ago

eefahy commented 5 years ago

There's been separate conversations happening re: CI for js-ipfs so here's a place to hash out what work is happening and what the plan is for the future. Looking through Q4 OKRs for js-ipfs I see two semi related KRs but neither set the stage for the work that has already happened.

KR: npm-on-ipfs is the default registry that IPFS developers and CI use to install npm dependencies KR: Continuous deployment requirements for infra are established making CD an option for all PL JS projects. 3 IPFS/IPLD/libp2p projects are continuously deployed

Infra has an OKR of: OB: Solidify CI as a service for teams KR: Supported CI service for public and private repos is known KR: Templates and documentation for common CI/CD workflows are available to teams

Can we consolidate effort here to surface concerns and where we should go from here? cc: @VictorBjelkholm @mburns @alanshaw @achingbrain @hugomrdias

alanshaw commented 5 years ago

KR: npm-on-ipfs is the default registry that IPFS developers and CI use to install npm dependencies

This is not ready just yet for use on CI. Right now this amounts to configuring npm to use a different registry URL (https://registry.js.ipfs.io instead of https://registry.npmjs.com) which would be very easy to sort after the setup has switched. @achingbrain may have some comments here...

KR: Continuous deployment requirements for infra are established making CD an option for all PL JS projects. 3 IPFS/IPLD/libp2p projects are continuously deployed

I think we need to bottom out what we require from continuous deployment. I was originally thinking something along the lines of:

Each successful build on master is published to npm as X.X.X-pre.12345 where X.X.X is the next version number is perhaps determined by conventional-recommended-bump - which picks the appropriate major/minor/patch version number based on the commit messages. 12345 is the build number or (short) commit hash.

I would like to be able to "promote" a successful build to release candidate. So, rather than building and testing again, just re-publish X.X.X-pre.12345 as X.X.X-rc.0 and the same for promotion to production release.

I also love the idea of actually just automating this all - a successful build on master is a release to npm 🚀

Right now though I'm feeling conservative and thinking that being able to manage the release candidate and production releases might be a good stepping stone towards that without having to take the plunge straight away.

It allows us to write blog posts and make a big fuss about new features in the next release and might enable developers to more easily reason about what version supports what features.

More importantly there's also the fact that we don't have a CITGM, and we have to manually run tests for projects that use IPFS and manually run our interop tests before a release. So until we have an automated solution for both of those I'm hesitant to have a entirly automated release setup.

@ipfs/javascript-team could I get some opinions?


There's some parallel work going on here to evaluate CircleCI and Gitlab (@mburns and @hugomrdias). I am loosely aware that the infra team would prefer to use CircleCI but I'd like to know the reasons behind that?

Hugo has a preference for Gitlab and has expressed a concern that there could be queuing issues with CIrcleCI that wouldn't effect Gitlab.

@hugomrdias - it would be good to get the full info here...

At the last JS Core Dev meeting we discussed moving to Gitlab for repos and issues and everything so from that perspective it would be nice to have it all in the same place (centralization yay...I mean...boo! 😈). That is just an idea right now but it would be good to get the infra team's opinion on this.

eefahy commented 5 years ago

Thanks for the info on the KRs. Re: CD - once we get testing sorted out, we might be able to get the best of both worlds by designing a tagging scheme for specific automated release steps.

My two cents on CI platform: if the code base stays on GitHub then it feels more natural to have CI directly tied to that repo, imho. It streamlines complexity of notifications and lessens the number moving parts/bits to manage. I’d like to keep it as simple to understand as possible. I’m not a huge CircleCI fangirl but it can potentially remove the need to run a costly Jenkins deploy while still meeting our requirements. Circle offers a performance plan that gives us 200 concurrent containers (so no more queueing issues) and allows us to configure instance size for Linux builds and a bigger instance for macOS builds.

If the team decides to move code management to gitlab then gitlab CI makes more sense. Are there compelling reasons to move the code base other than CI?

hugomrdias commented 5 years ago

I have some questions about circle ci

my experience setting up gitlab ci for js-ipfs felt exactly like setting up travis or any other "github ci".

We really should finish both prototypes and get feedback, i'm available to setup everything circle ci and gitlab ci, just need access to circle ci and machines for gitlab(empty installs with ssh access is enough i can do the rest).

eefahy commented 5 years ago

When will they have windows support ?

The last time I asked, Circle will have Windows support early 2019 - that's as much as I know.

mac os and windows will run in dedicated machines for us?

macOS CI is still on a VM. Are running those tests on dedicated hardware a requirement?

also, from @alanshaw:

I think we need to bottom out what we require from continuous deployment.

I'm going to revise my previous statement of waiting until testing is sorted out because of a recent discovery on the non-viability of CircleCI for CD. Turns out Circle doesn't have a good way to protect environment variables from being echo'd into public logs and therefore should probably be scratched from the list if we intend to do CD to npm like @alanshaw suggests.

That seems to leave us with a couple of options: redesigning the deployment of Jenkins to make it usable or migrating CI to gitlab-ci. The infra team is taking on a redesign of the Jenkins deploy ASAP (like starting this week) but it will probably be at least a couple of weeks before we are ready to move testing there. Regarding gitlab-ci I think we can just say that it will indeed run tests on a windows machine that we provide instead of spending infra engineering time to make the case.

So I think ultimately I think it's the js core dev team that needs to make the decision of if it wants to wait for the Jenkins redeploy work to land or move testing to gitlab-ci. Both are perfectly fine solutions imho. The trade offs are waiting for infra to solve the Jenkins problem or moving to gitlab-ci and having the dev team take on management of that.

daviddias commented 5 years ago

Seems that Travis now has Windows too https://blog.travis-ci.com/2018-10-11-windows-early-release

achingbrain commented 5 years ago

We could have successful builds of master create a deployable artefact then kick off a deployment build that pushes that artefact to npm or wherever - that way the master/PR builds would not need to contain sensitive ENV vars.

If the artefacts persist beyond the build lifetime it'd also help with @alanshaw's goal of promoting pre-releases to release candidates.

hugomrdias commented 5 years ago

@eefahy we need to have a full prototype running on gitlab for everyone in the js core team to play with and validate its a good option. Can you provide us a mac and windows machine/vm ?

eefahy commented 5 years ago

I’m sorry but it might be a little while. As part of our OKR for Q4 listed above, we’ve decided to spend our time fixing Jenkins as its solves a bigger problem. Since this gitlab work is outside of any listed OKR, I think the Jenkins work takes presedence.

However, we can tie this request to part of the Jenkins redeploy work since some of that is designing how to autoscale macOS & windows workers.

hugomrdias commented 5 years ago

It's kinda sad to hear that, ever since we came back from Glasgow we have been talking about making the prototypes to get feedback from the team, i have the gitlab prototype fully working (linux only) with one of our OKR finished plus another very important use case and i have been waiting for mac/windows temporary machines for a very long time. Now after all this time i hear for the first time this will not happen, this is just 2 clean vms running for a very short time period to validate a prototype.

So I think ultimately I think it's the js core dev team that needs to make the decision of if it wants to wait for the Jenkins redeploy work to land or move testing to gitlab-ci. Both are perfectly fine solutions imho. The trade offs are waiting for infra to solve the Jenkins problem or moving to gitlab-ci and having the dev team take on management of that.

The js core dev team is pending to gitlab ever since i finished the prototype and they saw it running on a fork of js-ipfs and other repos, but to make a final decision everyone wants to see all 3 OSes running.

eefahy commented 5 years ago

@hugomrdias I hear your frustration and understand that you feel blocked. Perhaps with a little bit of process here we can come to an understanding on the way forward.

Are there issues that you can refer me to that show your use case or how the team will come to make a decision on CI? I'm unclear on what criteria a prototype must satisfy and therefore am not sure why having a windows or macos machine would help move this forward.

Is the team prototyping CI services other than gitlab-ci? From @daviddias's comment it would appear that Travis could be an additional candidate to prototype.

Given the information I have available to me, I feel the best way forward at this moment (from the infra perspective) is to fix the reliability issues with Jenkins as that solves a larger, more pressing issue. However, I would love to hear thoughts and ideas from others if they disagree or have other suggestions on how best to move forward.

eefahy commented 5 years ago

The plot thickens... According to CircleCI's context docs

To use environment variables set on the Contexts page, the person running the workflow must be a member of the organization for which the context is set and the rule must allow access to all projects in the org.

So, it is possible to hide environment variables from non-contributors of a repo, however, only GH org admins can create and manage env vars in contexts and once they are made, contexts are available to all projects in the entire org... Might be good enough for some CD scenarios including publishing releases back to GH and npm

hugomrdias commented 5 years ago

@eefahy we are only prototyping gitlab nothing else. The criteria for a CI decision:

@alanshaw @achingbrain do you feel anything else is missing ?

A complete config is available here https://github.com/hugomrdias/js-ipfs/blob/master/.gitlab-ci.yml Circle CI PR https://github.com/ipfs/js-ipfs/pull/1669

alanshaw commented 5 years ago

To elaborate on what @hugomrdias posted: Our two main goals are 1) have CI with a pipeline to run all our test scenarios in parallel and 2) have a stable CI and only fails builds because of issues with our code/tests and not other reasons.

Parallel - we have a lot of tests, and many different environments that we want to run them in (see requested pipeline here https://github.com/ipfs/dev-team-enablement/issues/73). Running them in serial takes too long.

Stable - failing for reasons outside of our own code/tests significantly impacts our productivity but more importantly effects the community. It is confusing for new contributors to see errors unrelated to the changes they're making and worse still they don't have permissions to restart builds - they are stuck, and cannot know if their contribution is valuable or not. This is a really bad experience, and is a significat hurdle for people wanting to contribute to IPFS.

Right now, Jenkins falls short of these. The context that we're missing in this thread is that the js-ipfs team have come to the conclusion that Jenkins cannot address these issues. At least not right now, and it's not certain when it might be able to (if at all).

I believe it might be possible to address the stability issue (which has improved recently - we're really only seeing the disk space issue affect us frequently) but our pipeline requirements cannot be met because of a bug in jenkins.

What's the plan?

Here's a plan for the js-ipfs team. We have already made progress on some stages, but for the purposes of completeness and formalisation:

  1. Setup gitlab CI on js-ipfs (keeping Jenkins temporarily as the baseline)
  2. Setup a third CI on js-ipfs. Now that travis supports windows (thanks @daviddias for the link) I think it might be worth evaluating travis instead of circleci? (@achingbrain @hugomrdias 👍 👎 ?)
  3. Run them all for 1 month
  4. If CI results seem positive, prototype a continuous deployment configuration for the CI
  5. Evaluate the options and pick the best one
  6. Decommission the rest

Notes:

Any requirements for machines from infra will be requested here asap but will not effect us setting up these services - run without windows/macos initially if necessary, we can add them later.

Access to CI accounts will be requested here also asap.

hugomrdias commented 5 years ago

I can setup travis and circle easily. For Circle i need access, for travis i think i'm good.

Elexy commented 5 years ago

Is Azure Devops not an option? It supports Windows, is free for OSS and has manual gating, parallel builds etc. Small downside is that the code has to be hosted (mirrored) there as well.

I have quite some experience with it and automation around it.

alanshaw commented 5 years ago

hosted (mirrored)

Just to clarify, can our code still be github but mirrored there? Is it automatically kept up to date?

hugomrdias commented 5 years ago

same as gitlab, all automatic

eefahy commented 5 years ago

@hugomrdias one of our current issues with Jenkins is needing expensive, long running vms for macos and windows testing. Does gitlab allow for on demand runners? If not, can they be autoscaled to grow to the size needed for the current load?

hugomrdias commented 5 years ago

Not sure they use Docker Machine to autoscale and ondemand

https://docs.gitlab.com/runner/configuration/autoscale.html

Docker Machine supports lots of cloud providers.

Elexy commented 5 years ago

With Azure Devops the agents are fully managed and you don't pay a thing until you want more than 10 parallel jobs, for OSS. Autoscaling is managed for you whether it's a Linux, Windows or MacOS agent.

It even looks like it works natively with github repos: https://azure.microsoft.com/en-us/blog/announcing-azure-pipelines-with-unlimited-ci-cd-minutes-for-open-source/

Are there a requirements docs somewhere I could have a look at?

https://azure.microsoft.com/en-us/services/devops/pipelines/

scout commented 5 years ago

We moved to Travis for CI