zeromq / czmq

High-level C binding for ØMQ
czmq.zeromq.org
Mozilla Public License 2.0
1.19k stars 525 forks source link

Private servers linked in czmq upstream builds #1837

Closed vyskocilm closed 6 years ago

vyskocilm commented 6 years ago

Hi all,

I see private server jenkins2.roz.lab.etn.com a part of pull request flow. I am not convinced that adding private CI servers community can't look at and fix the problems is the best setup. Can we remove it from the flow?

https://jenkins2.roz.lab.etn.com/job/CZMQ-upstream/job/PR-1836-head/1/display/redirect

cc @bluca @sappo @jimklimov

jimklimov commented 6 years ago

We discussed it with @bluca during initial setup - indeed, a private hidden server is a problem, however knowing that the tests passed in an alternate environment (or failed, even if no details but still a cause to look at one's commited changeset twice) is better than nothing.

Replacing this stopgap solution with a publicly available server would be a bonus indeed. Jenkins team evaluates making a cloud offering like Travis, maybe this would pan out to get rehosted there... not sure what their terms would be for code they adopt to host, jenkinsfiles included. Probably there will be a need to port a travis-like solution to pre-build or otherwise ensure presence of prerequisite packages (e.g. libzmq, libsodium, etc?) There maybe dockers would be a correct approach, if pre-built ones with latest dependency packages are already published.

So in short - I am not in favor of just dropping this part of the test suite, but am in favor of rehosting it to somewhere more open and depending on availability of time might help to make it happen :)

jimklimov commented 6 years ago

Note: setup-wise, this is a matter of setting up a recent Jenkins v2 instance with pipeline support, and adding a Github Branch Source item to scan the zeromq organization, using an account that @bluca can add as appropriately permissioned in the organization or individual projects (same as 42ity-ci today). This account should be able to write, in order to provide status updates. A Jenkins instance with unrestricted internet access could also use webhooks to process PRs as soon as they are posted (otherwise the lag to discover new ones via polling is quite long and so far does not seem configurable in case of organization folders with autodiscovery - it can be configured if you set up each project manually).

vyskocilm commented 6 years ago

Hi,

I don't believe than running tests in yet another environment is going to improve czmq code :-) Right now project use 3 different environments (Ubuntu, OSX and Windows), which looks good enough. My main concern and complain is your CI makes some PR as RED, while I have no way to check the reason at the same time. The result is that CI is less reliable, because there are completely opaque and magical errors now.

Can't be your jenkins setup as optional, so it's failures won't be propagated?

jimklimov commented 6 years ago

Not sure really ;) One way might be to remove write rights for 42ity-ci, hoping it won't block jenkins from building those updates. Another - to post the build logs somewhere...

On Jan 23, 2018 16:05, "Michal Vyskocil" notifications@github.com wrote:

Hi,

I don't believe than running tests in yet another environment is going to improve czmq code :-) Right now project use 3 different environments (Ubuntu, OSX and Windows), which looks good enough. My main concern and complain is your CI makes some PR as RED, while I have no way to check the reason at the same time. The result is that CI is less reliable, because there are completely opaque and magical errors now.

Can't be your jenkins setup as optional, so it's failures won't be propagated?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/czmq/issues/1837#issuecomment-359818424, or mute the thread https://github.com/notifications/unsubscribe-auth/ABj5lKF1NZqxjagmPEANSTwr7PN4CRPZks5tNfUWgaJpZM4RnvXH .

jimklimov commented 6 years ago

So to sum up the discussions of options we had during FOSDEM with different teams:

1) Quick workaround:

A usable quick stop-gap solution is to set up the Jenkinsfiles for zeromq components to send "notifications" via e-mail (in case of builds of the upstream repo and PRs against it) to an address of a dedicated mailing list that would be stored on listbox or similar service. Such emails would have, in case of failures, attachments with logs and other artifacts - so these can be inspected by anyone.

A downside may be in usability (e.g. links straight from PR builds), although if the mailing list archive uses predictable URLs - maybe the builder can also post a comment with a link into the PR.

Pre-requisite: @bluca (or anyone else?) to set up the mailing list and its tracking by an HTTP-accessible service, and make the address of the archive known. Then the Jenkinsfiles of ZMQ ecosystem components can be amended to use their post{} clauses with same or very similar code blocks.

2) Building on an openly available Jenkins farm:

2a) Arrange with some opensource-sponsoring hosting project or company to provide a build farm for a Jenkins master and different-OS workers to test the zeromq stack more extensively on more platforms than current choice of Travis and Appveyor provide (e.g. more different Linux distros, as well as some illumos and BSD distros, etc.).

Options here may include friendship with Kubernetes who hanged out in HSBXL too, or with Jenkins efforts for an opensource farm.

2b) Alternately, some of the contributors might spin their own farm and provide it - essentially same as the current solution, except with access from the Internet. And for security-conscious folks, likely a farm dedicated to just building and testing this stack.

On a similar note, anyone can spin up a private Jenkins instance on their workstation or laptop anytime, and perhaps see it fail in different (constrained?) conditions and fix that fallout. This would benefit the project as well.

2c) Finally, someone individually (or with community chipping in) could rent a farm of VMs. There are a lot of options nowadays, ranging from established shops like Amazon and Azure, to numerous hosting farms that are businesses of illumos community members (after all, the platform is optimal for such work).

2d) The ultimate solution may as well be a mix of those, especially if people can provide resources to build on non-X86 systems.

Possibly, an approach like Jenkins Swarm can be used to have people set up their dedicated build agents on whatever they have, and instruct the agents to connect to a common Jenkins master on the internet, to spread the load and/or provide uncommon platforms that they are interested in for automated testing - win-win.

3) Further development work needed:

3a) A likely prerequisite for the general case of testing on different platforms (starting from some basic build root) is to have the Jenkins jobs build and install the prerequisites - similar to what we do in Travis, except there is no definite packaging tool to rule all platforms (per discussion with @bluca it may suffice to inspect pkg-config for the deps in a shell script and have them built if no/old dep is available). As a speedup at later stage, the job can maybe also try downloading and installing a published build artifact (e.g. built non-DRAFT workspace for the platform) from the last successful master build on the same Jenkins server, if it had tested that component anyways -- and fall back to a complete git checkout+build+install if not.

This is a relatively simple bit, and usable anyway to simplify in-house tests with Jenkins, so someone should step out and pick up the burden and carry it, soon :)

3b) Currently Jenkins does not have an easy way to do "Matrix-style" pipeline builds -- such that we could provide generic Jenkinsfile recipes in the components, and set up a Jenkins instance with its choice of multi-platform workers and fan out test jobs to such workers so they must all succeed as part of PR requirements. This problem might be tackled by the Jenkins project, and in the meanwhile we are free to invent our model of the wheel :) At least, there are quite a few possible ideas to solve this problem from different angles (keeping the per-component Jenkinsfile's generic and not intimate to a particular Jenkins setup).

3c) One issue to tackle for multi-platform tests (and maybe a vector for solution) is the use of agent with labels -- which are static in current Jenkinsfiles and can not be set via e.g. build arguments (per Jenkins core nuances), but which can be set via groovy variables in scripted pipeline.

Maybe we can use the agent label in each common Jenkinsfile as a variable (pre-set defaulting for the string from project.xml, if not set already), and the "matrix-like" job would set such variable and load() the Jenkinsfile of the component to instantiate build runs against different agent labels.

Ideally a solution should be like (or upstreamed extension of) current MultiBranchPipeline but with a further axis (via folder layer or job-name suffix) for the platform/label it built the codebase against - so each such configuration would have its own history of ups and downs, as well as a history of jobs that dispatched a test of particular source-code commit ID. For example, we might agree on a label format (e.g. zeromq-builder:<distro>:<cpuarch> and instantiate runs against known currently registered label values - likely there is a way to query those from a plugin or groovy script). This would still allow for a relatively hands-off configuration: point at a repo or github org, somehow provide the list of interesting labels of agents (static, or via swarm-like agent registration), and forget about it.

sappo commented 6 years ago

I like the idea of having multiple test CI servers if they fulfill the following requirements:

Those requirements should be part of C4 IMO.

@Jim: I don't want to deminish your great efforts in setup this whole construct up. But IMO it is simply to complicated and not intuitive for drive by comitters.

Am 08.02.2018 12:03 nachm. schrieb "Jim Klimov" notifications@github.com:

So to sum up the discussions of options we had during FOSDEM with different teams:

  1. A usable quick stop-gap solution is to set up the Jenkinsfiles for zeromq components to send "notifications" via e-mail (in case of builds of the upstream repo and PRs against it) to an address of a dedicated mailing list that would be stored on listbox or similar service. Such emails would have, in case of failures, attachments with logs and other artifacts - so these can be inspected by anyone.

A downside may be in usability (e.g. links straight from PR builds), although if the mailing list archive uses predictable URLs - maybe the builder can also post a comment with a link into the PR.

Pre-requisite: @bluca https://github.com/bluca (or anyone else?) to set up the mailing list and its tracking by an HTTP-accessible service, and make the address of the archive known. Then the Jenkinsfiles of ZMQ ecosystem components can be amended to use their post{} clauses with same or very similar code blocks.

  1. Building on an openly available Jenkins farm:

2a) Arrange with some opensource-sponsoring hosting project or company to provide a build farm for a Jenkins master and different-OS workers to test the zeromq stack more extensively on more platforms than current choice of Travis and Appveyor provide (e.g. more different Linux distros, as well as some illumos and BSD distros, etc.).

Options here may include friendship with Kubernetes who hanged out in HSBXL too, or with Jenkins efforts for an opensource farm.

2b) Alternately, some of the contributors might spin their own farm and provide it - essentially same as the current solution, except with access from the Internet. And for security-conscious folks, likely a farm dedicated to just building and testing this stack.

On a similar note, anyone can spin up a private Jenkins instance on their workstation or laptop anytime, and perhaps see it fail in different (constrained?) conditions and fix that fallout. This would benefit the project as well.

2c) Finally, someone individually (or with community chipping in) could rent a farm of VMs. There are a lot of options nowadays, ranging from established shops like Amazon and Azure, to numerous hosting farms that are businesses of illumos community members (after all, the platform is optimal for such work).

2d) The ultimate solution may as well be a mix of those, especially if people can provide resources to build on non-X86 systems.

Possibly, an approach like Jenkins Swarm can be used to have people set up their dedicated build agents on whatever they have, and instruct the agents to connect to a common Jenkins master on the internet, to spread the load and/or provide uncommon platforms that they are interested in for automated testing - win-win.

  1. Further development work needed:

3a) A likely prerequisite for the general case of testing on different platforms (starting from some basic build root) is to have the Jenkins jobs build and install the prerequisites - similar to what we do in Travis, except there is no definite packaging tool to rule all platforms (per discussion with @bluca https://github.com/bluca it may suffice to inspect pkg-config for the deps in a shell script and have them built if no/old dep is available). As a speedup at later stage, the job can maybe also try downloading and installing a published build artifact (e.g. built non-DRAFT workspace for the platform) from the last successful master build on the same Jenkins server, if it had tested that component anyways -- and fall back to a complete git checkout+build+install if not.

This is a relatively simple bit, and usable anyway to simplify in-house tests with Jenkins, so someone should step out and pick up the burden and carry it, soon :)

3b) Currently Jenkins does not have an easy way to do "Matrix-style" pipeline builds -- such that we could provide generic Jenkinsfile recipes in the components, and set up a Jenkins instance with its choice of multi-platform workers and fan out test jobs to such workers so they must all succeed as part of PR requirements. This problem might be tackled by the Jenkins project, and in the meanwhile we are free to invent our model of the wheel :) At least, there are quite a few possible ideas to solve this problem from different angles (keeping the per-component Jenkinsfile's generic and not intimate to a particular Jenkins setup).

3c) One issue to tackle for multi-platform tests (and maybe a vector for solution) is the use of agent with labels -- which are static in current Jenkinsfiles and can not be set via e.g. build arguments (per Jenkins core nuances), but which can be set via groovy variables in scripted pipeline.

Maybe we can use the agent label in each common Jenkinsfile as a variable (pre-set defaulting for the string from project.xml, if not set already), and the "matrix-like" job would set such variable and load() the Jenkinsfile of the component to instantiate build runs against different agent labels.

Ideally a solution should be like (or upstreamed extension of) current MultiBranchPipeline but with a further axis (via folder layer or job-name suffix) for the platform/label it built the codebase against - so each such configuration would have its own history of ups and downs, as well as a history of jobs that dispatched a test of particular source-code commit ID. This would still allow for a relatively hands-off configuration: point at a repo or github org, somehow provide the list of interesting labels of agents (static, or via swarm-like agent registration), and forget about it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zeromq/czmq/issues/1837#issuecomment-364078105, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeGujcmnhq-_vcZydKMwtKz4ddYfUj0ks5tStSJgaJpZM4RnvXH .

jimklimov commented 6 years ago

@sappo : I can not really say "no" to making this (the first point at least) a part of C4 requirements, as @karolhrdina advocated as well ;) The practical issue here is one of timeframe, resources and to some extent readiness of technologies to make it slim and convenient.

The quickest, cheapest and dirtiest solution right now is to pull the plug and just not let further PRs and commits be tested (and confusingly reported back) by the unfortunately private CI instance. This is likely doable single-sidedly by e.g. @bluca removing the write(/admin?) rights of the 42ity-ci user that is currently used to interact between that Jenkins instance and the Github zeromq organization. This would remove the sore point of not seeing PR results, as well as remove coverage of ecosystem codebase on yet another CI system, as well as to remove sort of public self-testing of Jenkinsfile pipeline recipes themselves (they do usually work before I submit relevant PRs, but still - this is not a formally clean QA).

The point on "MUST be able to see..." is elaborated earlier and boils down to either uncomfortable but quick interim workaround such as archived mailing lists (or equivalent) which at least lets people see the needed data, or to making happen a public Jenkins-master instance to interact with the builds and their logs in order for "MUST be able to see ... at the same level of comfort" which is certainly a worthy goal.

Note also that a publicly accessible Jenkins master instance has another benefit - it can receive webhook connections from Github, allowing instant configuration and start of PR jobs as relevant commits land into the repo. Also note that services like ngrok allow to set up a sort of VPN-tunnel from their public IP address and DNS name (dynamic for free, static if paid) for HTTP connections and/or general TCP/IP services (forwarding your port to a random port for free, or static if paid). MAYBE someone can arrange with ngrok to provide a free service for ZMQ community, if someone can run a Jenkins server in a private LAN, home, etc. but can't/won't expose it to the internet directly or via their own HTTP reverse-proxy. The HTTP tunnel/revproxy approach is known to suffice for both Jenkins web-interfaces and for REST API and for web-hooks. The TCP tunnel may also work to connect willing external agents to do the builds, but may require a bit of special setup (static rather than dynamic agent connection ports to ease the forwarding).

Keep in mind that people this decade are spoiled with fancy web-stuff, but as I talked to the *BSD teams at the table, I found they still test commits AFTER they landed to the relevant branch and everyone starts building it on their modern or weird and old systems, and post the results to relevant mailing list. If something spoiled with the new feature (or new fix), people generally fix the thing to work on their box and commit an update. Iterate until you push the wad of commits to the more-stable branch. Granted, they don't like it anymore compared to CI systems and PRs, and it might be solved by approach with a swarm of workers from those weird systems subscribing to a common master to get the jobs, but nonetheless this clumsy approach is viable and worked for decades (and was not clumsy, but rather state of the art, better than most - some time ago).

Finally regarding the ability of "Maintainers MUST have administrative rights to be able to create and configure CI job" - I am not quite sure how to process that :) The general approach using Jenkins so far is to provide Jenkinsfile recipes in each component that wants automatic testing. These files can be generated with zproject if applicable, or transplanted from another component, and adapted (just like .travis.yml files). The Jenkins master server tracks the individual SCM repos, e.g. discovered from a github organization with a "folder" like setup, and automatically sets up the jobs whenever it finds a branch with a Jenkinsfile in its root, including users' development branches (if they set up their local Jenkins to test their repos) and PRs against the monitored repo. Any new configuration or change of an old one, for a particular job, is done by committing an updated valid Jenkinsfile content - and eventually merging it into the common master branch as any other (configuration-as-)code PR.

Given the sort of clumsiness and non-replicability and "intimacy" (WRT a particular deployment) of legacy jobs, I am not sure we want to really require support of those. Indeed, management of Travis or Appveyor farms is not something ZeroMQ community maintainers do (or can do, unless they are part of those service provider teams too) - so no point in requiring that ability in C4. It may be an added bonus of a CI server run by and for the project community, but might as well not be there.

Finally2: But IMO it is simply too complicated and not intuitive for drive by comitters -- not sure about this one either : to use this feature from scratch, one just sets up a Jenkins instance on their system, and configures a "MultiBranch Pipeline" pointed at a single repository (can be a github fork or a local workspace, to be tested as soon as you commit something), or a "Github organization" pointed at e.g. https://github.com/zeromq -- and that should be it. Code is discovered and tested, jobs appear automatically wherever Jenkinsfiles are found.

vyskocilm commented 6 years ago

@sappo thanks, you've described my concerns well. I like the idea to propose C4 update to address CI/CD as well.

jimklimov commented 6 years ago

Pedantically, again +1 from me regarding the C4 ruleset to be definitive about this previously gray area, in any manner.

So far I see no one has stepped up with a constructive solution, to actually provision and deploy a Jenkins farm accessible from the internet, with the only voiced requests being to get rid of the confusingly invisible one (makes sense) and no favor toward clumsy build-log publishing e.g. via mailing list archives?

For now I'll try to sever the link from our Jenkins to Github, by using another account - so it would hopefully keep up building the stack for our team's private enjoyment and so I can keep track of Jenkinsfiles still working, but it will not (have rights to) post back into PRs with inaccessible links.

If anyone does get to deploying a solution, and my earlier comments are not exhaustive enough to set it up - feel free to contact me (perhaps by commenting in this issue?) and depending on available time I'd try to help.

sappo commented 6 years ago

Maintainers MUST have administrative rights to be able to create and configure CI job. At the same level e.g. Travis provides not administrative rights to the whole CI server.

This maybe phrased at bit harsh. Toggling builds on and off and maybe add a custom fork to be checked should be enough given that we can configure the job using the Jenkinsfile.

We could try setting up a jenkins server on a free openshift plan!?

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity for 90 days. It will be closed if no further activity occurs within 21 days. Thank you for your contributions.