Build Stages Part 2: Stages Order and Conditions

svenfuchs commented 7 years ago

We have shipped iteration 2 for the Build Stages feature.

Along with a lot of bug fixes and small improvements this includes:

Build Stages section: Define the order of stages, optionally with a condition
Conditional builds, stages, jobs: Define conditions for accepting/rejecting a given build, stage, or job

Find out more about this on our blog, and documentation here and here.

We would love to hear your feedback. Please leave all thoughts, comments, ideas related to this feature here.

Happy Testing!

FAQ

What feature requests have you received for improving Build Stages so far?

We are adding this list of feature requests (with a rough indication of how likely we are going to prioritize it) so you don't have to ask about these again :)

Pausing a build after a given stage, and proceeding only after an interactive confirmation. (Yes)
Add a native build artifacts feature that can be used for sharing artifacts across stages, or another way to more easily share storage or artifacts. (Yes)
Making the native cache feature more configurable by specifying cache keys per job. This would allow you to reuse caches in more flexible ways in subsequent stages. (Probably yes)
Allow script etc. on the new stages section, so they can be shared across all jobs in one stage. (Yes, but not before [1])
Allow specifying allow_failure: true per job on jobs.include. (Probably yes, but not before [1].)
Allow a stage key on the deploy section. Turn this into a stage and a job. (Not quite sure, not before [1].)
Add a name/description to jobs, in order to reveal the intend, show them in the UI. (Not quite sure.)
Consider skip: all, so one does not have to overwrite before_install, install, and script. (Sounds like a good idea? Not before [1])
Silence skipped commands: no log output. (Not quite sure.)
Consider "embedded matrix expansion", e.g. jobs.include.env: [FOO=foo, BAR=bar]. (Not quite sure)
Build config .travis.yml editor/web tool. (Yes, based on the specification produced in [1])
Allow grouping jobs into arbitrary groups, not depending on each other, like stages, just for visual presentation on the UI. (Probably not)
Update the GitHub commit status per stage, add with more detailed commit status information. (Unsure, definitely not before [2])
Consider automatically restarting jobs in later stages that have been canceled because a job on a previous stage failed when this job is restarted. (So far uncertain. There's a lot of complexity involved.)
In the running tab, consider grouping jobs per build (and possibly stage). (Interesting thought. We are working on improving this UI, and might consider this in a later iteration.)

Other improvements:

[1] A strict travis-yml parser has been shipped.
[2] The GitHub commit status API has the known limitation that new updates are being rejected after the 1000th update. They are working on improving this, and providing a way for us to post more updates. Until this is unblocked we are unlikely to make any changes to our commit status updates.

jrjohnson commented 7 years ago

I'm not sure why this condition isn't working. Seems right, but I must be doing something weird? if: (branch = master AND (type NOT pull_request)) I also tried if: branch = master AND (type NOT pull_request) as well as putting the condition into a separate stages key outside of jobs. You can see here where a fork branch deploy-conditions is running the deploy staging job and I think it should have been skipped: https://travis-ci.org/jrjohnson/frontend/builds/279259400

Any ideas?

TheSnoozer commented 7 years ago

Since most travis users use github it would be interesting to know if there would be a mechanism to provide more detailed feedback inside github what stage has failed building....Currently one see's only "fail"...but I guess this would be awesome if it would also to integrate detailed feedback on a per-job-basis.

webknjaz commented 7 years ago

@jrjohnson what about branch = master AND NOT type = pull_request?

jrjohnson commented 7 years ago

Thanks @webknjaz that was it. I was doing something weird!

bsipocz commented 7 years ago

@BanzaiMan - Thanks for the explanation. Here is what I found:

So, all in all, if you remove the top-level stage key, I believe you would get what you want. Please try it, and let us know how it works for you.

this did something completely different than I tried to set up: https://travis-ci.org/bsipocz/astropy/builds/280056850

However changing the top level "stage" to be a scalar fixed the problem, and now perfectly does what I tried to achieve: https://travis-ci.org/bsipocz/astropy/builds/280057351

Thanks again, I'm sure this feature will proved to be super useful.

qbradq commented 7 years ago

Really loving this feature! It makes it very convenient to omit specific steps from the PR builds such as publishing documentation and artifacts.

bsipocz commented 7 years ago

@BanzaiMan - Something went wrong with using the stages and trying to allow failure for one of the jobs, it simply not showing up in the matrix any more (previously I saw a footnote that one is allowed to fail, but recently that footnote is disappeared). Could you or someone else on the team please have a look?

https://travis-ci.org/astropy/astropy/builds/280521929?utm_source=github_status&utm_medium=notification

Edit: now it's being recognized again. Would it be possible that the footnote only appears for the second stage, once the first stage passed?

dra27 commented 7 years ago

Apologies if I've failed to RTFM, but it would be really useful if you could indicate in .travis.yml that it's OK for stages to proceed by operating system without waiting for all oses to complete.

See, for example, https://travis-ci.org/janestreet/jbuilder/builds/281290480 where the Linux builds have finished quickly (they were done within the first 2 minutes of the overall build) and could move on to 673.9 but had to wait for ages while the OSX builds catch up.

webknjaz commented 7 years ago

@dra27 I'm adding if: type in (api, cron) for OS X (same reason). Works like a charm and you can manually trigger build with OS X if needed

bsipocz commented 7 years ago

@dra27- Same situation here as for @webknjaz, we've just disabled the osx runs for PRs, and only run them for cron. Basically it was the main raison to use the stages. So if you're in a situation when osx is almost always passes when linux does, you may want to keep those runs separate to the last stage, or maybe even only for push and/or cron event types.

dra27 commented 7 years ago

@webknjaz, @bsipocz - thanks for the comments, though we definitely want the testing for macOS. I could add two more stages to push OSX later (that's a good workaround for now, thanks!), but that does reduce parallelism in the earlier stage. So it'd still be nice to have a feature which did it properly.

webknjaz commented 7 years ago

@svenfuchs @BanzaiMan

Look at the build: https://travis-ci.org/cherrypy/cherrypy/builds/281334209

Job 941.5 was stuck, so I've stopped it, it moved one line down (reordering? it was first in the stage, became second). And then I've hit run again on this job.

While I was doing this, and 941.5 was in canceled/created states, Test stage started. Please note that 941.5 is not allowed to fail.

So based on above I expect the following stage to not start unless all the jobs in previous stage are green (or allowed to fail). I also expect that cancelling job (even in case it's allowed to fail) should not trigger next stage start.

Anything I'm missing?

djspiewak commented 7 years ago

I love stages, and the implementation is quite solid, but I would love it if I could apply matrices a little bit more liberally. The fact that the test stage is magically defaulted is really weird; I would rather there be no default stage at all, and that I would have the flexibility to apply a matrix within each stage (potentially separate matrices within each). Basically, just allow the matrix key to be valid within the env declaration inside of stage, and similarly with other matrix-like declarations (e.g. rvm: [...]).

On an unrelated note… I've noticed that the build doesn't actually proceed beyond a stage boundary if I restart a job that previously failed in a prior stage. For example, imagine I have a matrix of jobs in my test stage, followed by a single job in my deploy stage. If one of the test jobs fails due to some intermittent error, I could restart it, but the deploy stage will never run unless I fully restart the build. I would rather see stages "pick up where they left off", as it were, in the event of a job restart.

Stage conditions really don't seem to work very well at all. I tried a conditional which was if: branch = master OR branch =~ ^backport/.*, and that didn't seem to apply at all (a pull request build kept getting matched as true). I tried a couple variants of that, including parentheticals, but ultimately had to give up and use a simpler conditional just to get the build to work.

Oh, Cancel Build doesn't always work reliably. Canceling jobs seems to be fine.

keithamus commented 7 years ago

This feature is amazing, thanks Travis folks!

One thing I'd love to see is some mechanism to name jobs - we have multiple builds running with Node 8 but each does a different thing. Right now I'm cheating by using environment variables, like this:

I'd be awesome to get a customisable name column, though.

TheSnoozer commented 7 years ago

@keithamus AFAIK if you run a matrix build you can define a custom script to be executed (example: https://github.com/TheSnoozer/maven-git-commit-id-plugin/blob/travis/.travis.yml#L26) see job run https://travis-ci.org/TheSnoozer/maven-git-commit-id-plugin/builds/281127348 where the global script at the bottom got overwritten by the definition of the one in the matrix build

keithamus commented 7 years ago

@TheSnoozer custom scripts really aren't the problem; its just the aesthetics of each build in the matrix. I want to just give them nice names so it all looks readable. The environment variables aren't used for anything other than to indicate what build is what in the matrix overview page.

jeking3 commented 7 years ago

This may not be exactly related to build stages, but I would like an easy way to tuck a docker image into your cache. I am using build stages to build all the docker images that Apache Thrift needs (about 20 languages) and then running with those images. It would be nice if I could tuck away the image locally so speed up subsequent downloads.

Furthermore, I would REALLY like a way to save off a docker image AFTER a build and then use it in subsequent builds. For example in the Apache Thrift build I need to spend about 20 minutes building 20 languages. THEN I would like to tuck it away and reuse it to run unit tests and cross-language integration tests in parallel in a number of jobs.

elgalu commented 7 years ago

Hi, thanks for this great feature!

At Zalenium and docker-selenium we've noticed that there is an implicit parallel tests limit of 5 VMs.

Is it possible to get more? even by paying?

thanks

wright-development commented 7 years ago

Decided to use it for my open source library, works really well. Great job guys!

pkittenis commented 7 years ago

Great feature, thanks!

There may be a bug with matching multiple stage/job conditions.

This works - if: tag IS present

Neither of these do: if: tag IS present OR branch = master if: (tag IS present) OR (branch = master)

Eg, these two (1, 2) non tag, non master branch builds got built (the osx build), this one with only the tag condition did not (no osx build).

BanzaiMan commented 7 years ago

@pkittenis https://travis-ci.org/ParallelSSH/parallel-ssh/builds/284838296 and https://travis-ci.org/ParallelSSH/parallel-ssh/builds/284841062 a PR build for the master branch, so the branch = master condition is met.

The corresponding Push build for the libssh2 branch for https://travis-ci.org/ParallelSSH/parallel-ssh/builds/284838296 is https://travis-ci.org/ParallelSSH/parallel-ssh/builds/284838280, which does not include the osx build.

pkittenis commented 7 years ago

Oh, I see, so PRs against master branch count as well. Thanks for the feedback @BanzaiMan 👍

sbrunner commented 7 years ago

I just try:

if: env(TRAVIS_REPO_SLUG) = camptocamp/c2cgeoportal

but it's don't looks to works ... and it can be fine if I can specify it as (as for deploy) :-) :

if: repo = camptocamp/c2cgeoportal

sbrunner commented 7 years ago

I just find an other issue:

jobs:
  include:
    - stage: stage1
      ...
    - stage: stage2
      if: <something false>
    - ...

in this case the last job will be included in stage1 instance of stage2...

softprops commented 7 years ago

Would it be possible to declare a name per job within the a stage. My team is roughing out a translation to what we currently use mutijob phases in jenkins it would be nice to reflect the name of a failing job rather than a script ref

softprops commented 7 years ago

found a work around for this but it's not ideal. I'm using a per stage env var that represents what the job is to get it reflected in the build UI

screen shot 2017-10-10 at 10 31 25 pm

webknjaz commented 7 years ago

@softprops I went further today: turns out you don't event need JOB= prefix ^^

softprops commented 7 years ago

ha, env vars, sans values. nice

purohit commented 7 years ago

Reiterating support for @keithamus request for job names within stages and having them reflected in the UI; we have multi-language tests within a stage, and they all show up in the UI as "Ruby" even though they're different, which is confusing.

We can't use the language: xxx construct because it's broken for the latest versions of Go (1.9.1) and for Clojure we don't want the lein stuff, so we custom install those, but would like to name them so it's easy to see what is being tested.

softprops commented 7 years ago

I had a question about how build stages interact with builder limits for premium travis pro accounts.

I'm looking at https://billing.travis-ci.com/subscriptions/myorg and and we pay for a premium account which gets us up to 10 concurrent jobs.

Can you confirm if "job" here represents per-pr job or a job within a multi stage build within a pr job? We' hoping to hear that is per-pr job because otherwise this will back up our queue for our other travis pro repositories as well as single repositories that have a high volume of prs.

For example, we're looking to understand if the expectation should be: given a 5 job multi stage build and 5 prs branches pushed to, does that mean can we only build 2 prs at a time?

djspiewak commented 7 years ago

@softprops I can answer your question, since we (@slamdata) are basically in the same boat. "Job" means "job within a multi-stage build". Stages seem to be internally represented as just a sequencing thing, and each element of each stage (meaning: each matrix-expanded job within the stage) is a separate job. You can see this more visibly if you look at the TRAVIS_JOB_NUMBER variable.

The Quasar repo pretty directly confirms all of this as it relates to concurrency semantics as well: with multiple PRs out and master building, sometimes jobs in later stages might end up waiting for a while before they can complete, simply because other PRs are filling up the workers. This was happening before as well though (we also have a rather "wide" matrix build, and had it even before we staged the travis configuration), so the only thing that is changed is the individual jobs are shorter (since work that was previously all done in one job is now split across multiple stages), allowing more efficient work stealing. This is a benefit to us since we also pay for a limited number of higher capacity Travis VMs (I highly recommend this account upgrade, btw, even though it is fairly pricy). So keeping jobs short allows us to make better utilization of these significantly faster workers, which more than compensates for the added overhead of having to repeatedly reprovision workers for tasks that would have otherwise been folded into a single job.

For example, we're looking to understand if the expectation should be: given a 5 job multi stage build and 5 prs branches pushed to, does that mean can we only build 2 prs at a time?

Effectively yes, but the jobs will all interleave. So in the end it works out to be the same thing, but with better work-stealing semantics.

softprops commented 7 years ago

Thanks @djspiewak. This is very informative.

@svenfuchs is this the desirable effect? For context, we have a lot of small private repos and are now looking to move the CI for our primary large repo from an internally hosted jenkins to travis. Multi stage builds are kind of indispensable for sane turn around times of ci checks of our large repo but we're worried multi stage builds could have a negative impact on the CI capacity to smaller travis pro repos.

agronholm commented 7 years ago

My issues with the build stages feature:

~~Stages are often displayed in a random order instead of the one defined in .travis.yml~~
~~Sometimes stages are executed in the wrong order too~~
~~The conditions are ignored~~

Here's a build which demonstrates all three problems: https://travis-ci.org/agronholm/sample-c-extension/builds/287677809

EDIT: the wrong order was my fault (had a typo in the stage name). EDIT 2: I had the tag matching operator wrong in the condition. It all works now!

agronholm commented 7 years ago

I should add that things started going wrong once I added the stages: section to .travis.yml. The stage order seemed fine before that, although I'm not 100% sure.

nurupo commented 7 years ago

I want to create 4 jobs under a single stage that differ only by env, so something like

matrix:
  fast_finish: true
  include:
    - stage: foo
      ...
    - stage: doesn't work
      os: linux
      env: 
        - FOO=bar
        - FOO=baar
        - FOO=baaar
        - FOO=baaaar
      services:
        - docker
      cache:
        directories: 
          - /opt/$FOO
    - stage: bar
      ...

script: ./$FOO.sh

but this doesn't work and I'm forced to repeat the same stage over and over but with a different env

matrix:
  fast_finish: true
  include:
    - stage: foo
      ...
    - stage: doesn't work
      os: linux
      env: FOO=bar
      services:
        - docker
      cache:
        directories: 
          - /opt/$FOO
    - stage: doesn't work
      os: linux
      env: FOO=baar
      services:
        - docker
      cache:
        directories: 
          - /opt/$FOO
    - stage: doesn't work
      os: linux
      env: FOO=baaar
      services:
        - docker
      cache:
        directories: 
          - /opt/$FOO
    - stage: doesn't work
      os: linux
      env: FOO=baaaar
      services:
        - docker
      cache:
        directories: 
          - /opt/$FOO
    - stage: bar
      ...

script: ./$FOO.sh

webknjaz commented 7 years ago

@nurupo matrix expansion does not work inside of stages, this is explicitly stated in the documentation.

nurupo commented 7 years ago

Yeah, and that makes me sad and my code wet (as opposed to DRY) :\

webknjaz commented 7 years ago

@nurupo you can use this feature of YAML called anchors

webknjaz commented 7 years ago

Use case

There are several stages (say stage1, stage2 and stage3). I want to only test stage1, so I go to Travis's web UI and cancel jobs from stage2 (in my case all of its jobs), so that it won't proceed with execution of stages (i.e. stage3)

Actual result

When I opened the build page after some time and it turned out that stage3 hasn't been executed

My expectation

I'd expect that if I cancel jobs from stage2 the overall status of stage2 won't be treated as successfully completed. This is expected to prevent stage3 from execution.

/cc: @svenfuchs @BanzaiMan

pschambacher commented 7 years ago

Thank you for the stages. I think that they're great. Here's an idea of something that might be useful:

I have a project, it's a Rails application with a React front-end. We need to run bundle install for Rails to work and we need to run npm install for React to work. At the moment, we have a stage that runs both and a stage that runs both unit and feature tests. Here's what would be nice: have a stage that runs bundle install, another that runs npm install, another stage where unit tests run and a fourth one where feature specs run. My idea to achieve that would be to not make stage run by order but by dependency.

Here's what it would look like:

stages:
 - bundle-install
 - npm-install
 - unit-tests
 - feature-tests

jobs:
  include:
    - stage: bundle-install
      ...
    - stage: npm-install
      ...
    - stage: unit-test
      requires: bundle-install
      ...
    - stage: feature-test
      requires:
        - bundle-install
        - npm-install
      ...

This would make any stage with no requirement to start in parallel.

agronholm commented 7 years ago

@pschambacher why don't you simply run npm-install and unit-test in the same stage and incorporate bundle-install in the same job as unit-test?

Are you caching your results? If not, then the dependencies make no sense anyway.

pschambacher commented 7 years ago

@agronholm We're caching but there's a lot of stuff waiting for npm install to complete when not needed. Trying to find a way to have the minimum waiting time.

agronholm commented 7 years ago

It's pretty hard to see from your example.

Narretz commented 7 years ago

Add a name/description to jobs, in order to reveal the intend, show them in the UI. (Not quite sure.)

That's a no-brainer, imo. Especially since devs are using environment variables to do this, which messes with the caching.

nurupo commented 7 years ago

I just put the description in the sage's name, like stage: "Windows stage 1: build all deps", though in my case all the jobs in a stage do the same thing, I don't need to use different descriptions per job, which might not be the case for you.

As far as env variables and cache goes, I have worked around it by putting them as script arguments, since script: doesn't affect caching:

# BUILD_TYPE=debug won't get the cache from Stage 1
- stage: "Windows stage 1: build all deps and cache them"
  env: ARCH=i686 BUILD_TYPE=release
  script: ./.travis/build-windows-deps.sh
- stage: "Windows stage 2: build our app"
  env: ARCH=i686 BUILD_TYPE=release
  script: ./.travis/build-windows-app.sh
- stage: "Windows stage 2: build our app"
  env: ARCH=i686 BUILD_TYPE=debug
  script: ./.travis/build-windows-app.sh

vs

# everything gets the cache from Stage 1
- stage: "Windows stage 1: build all deps and cache them"
  script: ./.travis/build-windows-deps.sh i686 release
- stage: "Windows stage 2: build our app"
  script: ./.travis/build-windows-app.sh i686 release
- stage: "Windows stage 2: build our app"
  script: ./.travis/build-windows-app.sh i686 debug

bbert commented 7 years ago

I have an issue with stage condition. I tried:

- stage: test
  if: env(TRAVIS_EVENT_TYPE) = cron
  script: ...

But the stage 'test' is always triggered, whatever the event type.

webknjaz commented 7 years ago

@bbert: if: type = cron

surli commented 7 years ago

First, thanks for introducing the stage concept in Travis, it's a really cool feature! But (there's always a "but"), stages are showing a new evidence regarding a general bug I reported in https://github.com/travis-ci/travis-ci/issues/8577

In short: if I have a Travis configuration with two stages, that I'm using for PR, I have no guarantee that the first and second stage will be executed on the same commit.

In fact, I just had the problem right now. I created a PR in my project to use stages with different bash scripts. I pushed in the following commit (https://github.com/INRIA/spoon/pull/1655/commits/275974917dcce1c25966cbcb997972ea9bf1b0e2) which created the following build: https://travis-ci.org/INRIA/spoon/builds/293106159.

Before my first stage complete, I pushed another commit in which among other changes I renamed one of my script: https://github.com/INRIA/spoon/pull/1655/commits/b7d0d6882c24431bf8dddc81c7e854f69e5b4a72 Then chore/travis-run-coverage.sh no longer exists and has been replaced by chore/travis-run-tests-and-coverage.sh. And when looking to the second stage of the previous build I saw the following error (in https://travis-ci.org/INRIA/spoon/jobs/293106164):

/home/travis/.travis/job_stages: line 57: ./chore/travis-run-coverage.sh: No such file or directory

Because when launching the second stages, it used the last commit, not the same than for first stage. In my case it's not a big deal, but I can imagine it can be really annoying if the stage is related to deployment or (more probably in PR) if it's related with specific tests and you cannot relate on the integrity of the build.

bbert commented 7 years ago

@webknjaz I tried what you suggested (if: type = cron), but no success

webknjaz commented 7 years ago

@bbert works for me. Consider sharing link to your build + config

travis-ci / beta-features