Open myii opened 5 years ago
please also test cirrus-ci: https://cirrus-ci.org/#comparison-with-popular-ciaas
@aboe76 Would you mind bringing that up in the Slack room? More eyes will see it there. In the meantime, what do you think about limiting the number of runs per formula? Should it be one for each os
; is that sufficient?
most formula's only needs these different os's, :
when switching to circus ci the concurency limit isn't a factor any more: https://cirrus-ci.org/#key-highlights
most formula's only needs these different os's, :
- debian (this covers ubuntu/debian derivites)
- centos (this covers redhat/fedora derivites)
- opensuse (because sles/susemanager stuff)
- bsd
I agree with @aboe76: having a 20+ images available does not mean that we need to test with all of them. I think that to test that a formula works, the rationale should be something like:
tests for the required combination of the above should be added to the test matrix.
I think that part of the review of PRs should check the testing matrix to be enough but not an excess. Testing is easy, knowing what and when to test is not :yum:
Also, cirrus-ci sounds interesting :+1:
My 2c
So after some discussion on Slack prompted by @aboe76, we've got Cirrus CI enabled for the whole org. Some specific notes about BSD:
- - - 21:07 gtmanfred you would need to use vagrant for bsd 21:08 gtmanfred realistically though, you could use kitchen passthrough, to run the commands on the bsd instances, instead of creating a new docker container to run the tests on 21:10 gtmanfred when you get to it, see how we using the proxy for windows on appveyor https://github.com/saltstack/salt-jenkins/blob/develop/.kitchen.appveyor.yml 21:10 gtmanfred it should translate to bsd probably
I've started work on testing Cirrus CI with cron-formula
but I've not yet had success getting it to play nicely with kitchen
. There'll be some special invocation but there aren't any examples out there in the wild that I could use as references.
I've started work on testing Cirrus CI with
cron-formula
but I've not yet had success getting it to play nicely withkitchen
. There'll be some special invocation but there aren't any examples out there in the wild that I could use as references.
https://cirrus-ci.org/guide/writing-tasks/#additional-containers
So a first example of the difference between 15 and 5, thanks to @daks:
Instances | Running time | Total time |
---|---|---|
15 | 16 min 24 sec | 37 min 20 sec |
5 | 7 min 17 sec | 10 min 47 sec |
@aboe76 @javierbertoli Are these the 5 images you're suggesting or did you mean stick with the os_family
images and only use the os
variants when necessary?
By the way, also note that we're leaving all of the platforms available in kitchen.yml
, so that any/all can be tested locally.
@myii I would prefer to stick with the os_family variants, and then a little bit conservative, so: 'RedHat' => use centos images 'Debian' => use debian images 'Suse' => use opensuse leap images if possible. 'Bsd' => ??
Having to wait 16 minutes to have a test run for a PR is a bit much...
@aboe76 Thanks for that. What do you think of my comment I've just made after you suggested that: https://github.com/saltstack-formulas/ufw-formula/pull/7#issuecomment-493707109?
Yes @myii, I think that is a good comment, like with the testing of the the formula works on OS, the same thing can be applied to salt versions.
The testing should be able to spot regressions and issues earlier, but not in a way that is hindering or slowing everything down. I know about edge cases that we might miss, but I don't think that most of the community will be better served if we stick to the broad road instead of narrow paths.
So #121 is a suggested workaround to use for the time being. Please have a look so that this selection can be propagated throughout all of the formulas using 10+ instances (i.e. all semantic-release
formulas to date).
I've just noticed that Limit concurrent jobs
has been set to 1
in Travis. I'm not sure that's such a good idea now, since it reduces the benefit of having these run in parallel, to get releases done quickly. Very busy periods aren't that common, so at quiet times, it really slows things down. I've unset this again for the time being but feel free to discuss this if it is an issue.
So a little update. With some help from the Cirrus folk and working together with @javierbertoli, we've started making progress with Cirrus CI. Some parts of the developments are mentioned from this point forward, for anyone interested in more of the details. Or to get straight to the output, you can see it here: https://cirrus-ci.com/build/5647302937542656.
I'd like to pull in @alxwr here: Cirrus CI allows testing on *BSD as well. Is this something you'd be interested in helping out with? Or for whoever fancies taking this on, this comment above is pertinent to that.
We're also seriously considering keeping both Travis and Cirrus. Travis for the the release process (i.e. commitlint
and semantic-release
), with Cirrus for the actual testing.
To gauge whether this is really going to make a significant difference, we need some "lambs to the slaughter" -- suggestions of formulas that can be converted to this hybrid system. Preferably formulas that involve *BSD, MacOS and Windows as well, so that we can test those features as well. It's not really much of a risk since we can always re-enable the matrix in Travis again if it doesn't work out.
We're also seriously considering keeping both Travis and Cirrus. Travis for the the release process (i.e.
commitlint
andsemantic-release
), with Cirrus for the actual testing.
Why not... :) I didn't reply until now because I was thinking about the CI config file : it's cool to have a performant CI but it's cool to have nice config files too ;) And IMHO, Azure's config file is not very sexy, so not very easy to understand and to maintain/evolve. On the other hand, Cirrus' config file is as easy to understand than Travis one so it's a good point for Cirrus.
@n-rodriguez You spoke too soon... wait until we start adding *BSD, MacOS and Windows!
Just noting another benefit of keeping the release process in Travis: all of the GitHub tokens that have already been set across the repos. Having to do that again for Cirrus isn't appealing.
https://cirrus-ci.org/faq/#are-there-any-limits:
Are there any limits?
There are no limits on how many VMs or Containers you can run in parallel if you bring your own compute services or use Compute Credits for either private or public repositories.
Cirrus CI has following limitations on how many VMs or Containers a single user can run for free for public repositories:
- 8 Linux Containers or VMs
- 2 Windows Containers or VMs
- 2 FreeBSD VMs
- 1 macOS VM
Which means that a single user can run at most 13 simultaneous tasks for free.
No per repository limits
Cirrus CI doesn't enforce any limits on repository or organization levels. All the limits are on per user basis. For example, if you have 10 active contributors to a repository then you can end up with 130 tasks running in parallel for the repository.
In terms of ensuring all checks are passing before allowing merges, it's probably best to leave that to the GitHub UI rather than cross-checking from Travis and Cirrus:
If there really was a need to check, there's probably a way via. the GitHub API:
Here's the first PR merged using Cirrus CI alongside Travis CI:
Note, we enabled all of the instances in the matrix for initial testing. We'll reduce that eventually, if all goes well.
First negative points to collect:
Cirrus CI uses preemptible instances for cost optimization for free OSS builds so instances sometimes got preempted but Cirrus then automatically reschedules it on a regular instances. This is a rare event.
Unfortunately, I've already seen it happen a few times.
Some more bad news, unfortunately.
Reduced the test matrix back to 6 (like our reduced Travis runs) but the first stage of the PR (before merge) hasn't completed yet: https://cirrus-ci.com/build/5778854061277184 -- over an hour.
Update: I went away from this comment for a few moments and it's just finished. Now to try the PR merge to see how long that takes...
@myii answered inline:
- There's no (easy?) way to cancel and re-run jobs.
You can do from the Cirrus CI Web UI.
- Concurrency per user can end up quite painful -- I'm still waiting for the merge of the first PR above to complete before the checks on the second PR can begin... it might delay me by 30 minutes (this is an extreme case).
Unfortunately this is how limits are working for free tasks for OSS. Travis, for example has limit of 5 for an organization. You can consider to use compute credits to avoid the limits. A few OSS projects using it to avoid the concurrency issue for organization members.
- Each instance runs slower than the Travis equivalent, maybe even half the speed (1.5 minutes vs. 3 minutes).
I see that bundle install
takes quite a while in some tasks. You can consider adding caching as shown in Ruby example.
- Instances can take a long time to start up; we're talking about sometimes even 10 minutes before any start up.
There is currently an issue where we hit quotas. A few other organization are way too active together which led to the issue. Bad timing for you. 😅
- The instances sometimes end up restarting without any clear reason why. This was the answer I got in response to that.
There is always should be a link to documentation about the restart issue. Instances are getting rescheduled sometimes for cost optimization on our end to provide the free service.
@fkorotkov Thanks for your responses, that will give us some ideas about things we can try out.
@fkorotkov By the way, in terms of starting, stopping and restarting instances, do you have to open each one manually to do that? We're a bit spoilt by how Travis allows you to do everything from one page.
Update: I went away from this comment for a few moments and it's just finished. Now to try the PR merge to see how long that takes...
Some positive news, that didn't work out so bad, around 15 minutes: https://cirrus-ci.com/build/5976877521436672.
@fkorotkov above:
I see that
bundle install
takes quite a while in some tasks. You can consider adding caching as shown in Ruby example.
I've tried various invocations but I can't work out how to get this to work within docker_builder
. I didn't find any further documentation about how to take advantage of bundle_cache
, except for this example. Can you suggest how to use it in this situation?
@myii you can simply put
bundle_cache:
folder: /usr/local/bundle
fingerprint_script: echo $RUBY_VERSION && cat Gemfile && cat Gemfile.lock
populate_script: bundle install
before verify_script
.
Also please feel free to create an issue for adding more buttons like you want.
@fkorotkov That was the first thing I tried. I'll do it again now and then link you to the result. Can you confirm the amount of indentation to use?
@fkorotkov https://cirrus-ci.com/build/5757379426123776:
FAILED_PRECONDITION: Unknown field bundle_cache on 'Test $INSTANCE' YAML node!
@myii indeed Docker Builder doesn't support any other instructions beside script
s. I don't see a reason why it shouldn't support caches, artifacts, etc. Let me fixed it. 🙌
@fkorotkov Thanks.
It should work now.
@fkorotkov Thanks for the fast turnaround. While the test run completes, it hasn't appeared to make any difference to the time taken: https://cirrus-ci.com/build/5643032095883264. Anything else that needs to be done with the configuration?
@myii seems bundler caches to a different folder /var/lib/gems
. I've tried to cache it but then after cache is restored bundle can't find kitchen
. Not sure what's going on there. 😪
@fkorotkov Thanks for digging into that, maybe someone else on this thread may be able to figure it out. Appreciate your help.
https://freenode.logbot.info/saltstack-formulas/20190516#c2180979:
https://freenode.logbot.info/saltstack-formulas/20190516#c2181226:
https://freenode.logbot.info/saltstack-formulas/20190516#c2181237: