travis-ci / beta-features

The perfect place to leave feedback and comments on newly released Beta Features.
56 stars 68 forks source link

Build Stages: Flexible and practical Continuous Delivery pipelines #11

Closed joshk closed 7 years ago

joshk commented 7 years ago

From simple deployment pipelines, to complex testing groups, the world is your CI and CD oyster with Build Stages.

Build Stages allows you and your team to compose groups of Jobs which are only started once the previous Stage has finished.

You can mix Linux and Mac VMs together, or split them into different Stages. Since each Stage is configurable, there are endless Build pipeline possibilities!

This feature will be available for general beta testing soon... watch this space 😄

We love to hear feedback, it's the best way for us to improve and shape Travis CI. Please leave all thoughts/comments/ideas related to this feature here.

Happy Testing!

EmilDafinov commented 7 years ago

@BanzaiMan In this build

https://travis-ci.org/EmilDafinov/scala-ad-sdk/builds/236804048

The "Verify project version" stage ran after the "Test" stage, even though it is defined first in the .travis.yml file. Why is this the case, is there a way to run a stage before test ?

BanzaiMan commented 7 years ago

@EmilDafinov There is a surprising behavior with matrix inclusion (which is equivalent to build stage definition) described in https://github.com/travis-ci/travis-ci/issues/4681. In other words, there is a job implicitly defined by your scala and jdk keys in your configuration, which belongs to the default "Test" stage. You notice that there are 3 jobs in the "Test" stage, even though you explicitly define 2. You also notice that the first job skips the script phase, which is defined in the "base" level of your configuration.

I tried a few things to remove the first job, so that the rest can go as you defined in jobs.include, but I have not been able to. I'll take another look at https://github.com/travis-ci/travis-ci/issues/4681 next week to really fix it.

catdad commented 7 years ago

Not sure if I am the only one annoyed by this, but in my deploy steps, I include an on section to deploy only on tags, etc. I sorta expected this to make the entire stage be skipped on commits that are not tags. But it doesn't. The stage VM is built, dependencies installed, etc. In my builds, it does about a minute of work only to accomplish nothing, because the deployment is being skipped.

Now, I understand that this is technically because on on condition is applied to the deploy action inside the stage, and not to the stage itself. Would it make sense to be able to add the conditions to the stage itself though? Or is that already possible and I just missed it? I hate waiting that extra minute, and wasting both my time and travis time with a step that is meant to accomplish nothing.

BanzaiMan commented 7 years ago

@catdad Thanks for the feedback. This is the same issue that has been raised before; changing the jobs defined in the build configuration in terms of the build's properties:

https://github.com/travis-ci/beta-features/issues/11#issuecomment-302726204 https://github.com/travis-ci/travis-ci/issues/7149 https://github.com/travis-ci/travis-ci/issues/7181 https://github.com/travis-ci/travis-ci/issues/7758

We recognize that this is an important feature to implement, and have an internal issue to track it.

EmilDafinov commented 7 years ago

I'm sure it has been mentioned before, but having the option to name individual jobs and have the names appear in the UI instead of just a number would be very neat. Currently I fake it by defining an env variable NAME, because these appear in the UI :)

https://travis-ci.org/EmilDafinov/scala-ad-sdk/builds/237487140

BanzaiMan commented 7 years ago

@EmilDafinov I've fixed travis-ci/travis-ci#4681, so that your build matrix (https://github.com/travis-ci/beta-features/issues/11#issuecomment-304490133) should now behave as you intended. Please push another commit (restarting is not sufficient) to test it.

BanzaiMan commented 7 years ago

@chrisguitarguy re: https://github.com/travis-ci/beta-features/issues/11#issuecomment-304416434, a side effect of the fix for travis-ci/travis-ci#4681 is that this behavior changed. In the matrix.include jobs, the first value of the array in the expansion keys will be assumed. This should not affect the current way to do things (i.e., overriding the values).

See https://github.com/travis-ci/docs-travis-ci-com/pull/1215 for documentation changes.

lumag commented 7 years ago

It seems that with build stages I can not get after_failure to run even after main script entries. https://travis-ci.org/lumag/odp/jobs/238652158

lumag commented 7 years ago

Hmm. It looks like after failure gets executed only if the last command of the script fails. If one of the previous commands fails, build is marked as failed, but after_failure isn't executed.

alorma commented 7 years ago

Hi.

I have a deploy stage, with limitation to on tags.

  - stage: deploy
    jdk: oraclejdk8
    deploy:
      provider: script
      script: scripts/bintrayUpload.sh
      skip_cleanup: true
      on:
        tags: true

But in this job: https://travis-ci.org/SchibstedSpain/Barista/builds/238687258 the deploy stage is runing even if it shouldn't.

I canceled the deploy job because there was another from main branch

Is there a problem?

Travis script

PR: https://github.com/SchibstedSpain/Barista/pull/89

BanzaiMan commented 7 years ago

@alorma There is not. deploy.on.tags: true just means that the deploy phase in the job runs when the build is triggered by a tag. It does not mean "define and run this job when it is a tagged build".

BanzaiMan commented 7 years ago

@lumag after_failure is executed here https://travis-ci.org/lumag/odp/jobs/238652158#L4596

alorma commented 7 years ago

Hi @banzaiMan

Does it mean i should update my build script with a tag check?

alorma commented 7 years ago

because @banzaiman in my travis config, that phase is runing even if not tagged

BanzaiMan commented 7 years ago

@alorma Currently it is not possible to alter the build configuration itself based on the build request's properties (e.g., branch, tag, event type ("push" vs "pull request" vs "cron")). There is an internal ticket to track this common request, but we have no ETA.

In this particular case, it means that all build stages will be configured. And it is executed if the build proceeds to that stage.

alorma commented 7 years ago

Ok, so i will add a if (tag) on my script, it will do the trick.

I'm right @banzaiman?

Jacobkg commented 7 years ago

How are you deciding where to allocate paid Travis-CI parallel builds. If I am paying for say 10 workers, will you allocate some of those workers to jobs in later build stages that aren't running (thus blocking earlier stages of subsequent builds) or will you not use up capacity for later stages until the earlier ones complete successfully?

webknjaz commented 7 years ago

@svenfuchs @BanzaiMan

Look at Test under os x (last chance to fail before deploy available) here https://travis-ci.org/cherrypy/cherrypy/builds/239295349

It's failed (because of one of required jobs failed), BUT it states Your build matrix was set to allow the failure of job 778.24 so we continued this build to the next stage., which is not really true.

It's got failed 778.24 and 778.23. The first one is allowed to fail, and the last is not. This means failure for the whole stage and the next stage is not run. However, I think, the message should not state that we continued this build to the next stage. I would rephrase it to but there are some other jobs, which failed, thus we won't proceed with the next stage.

This feature has been added via travis-ci/travis-ci#7789.

BanzaiMan commented 7 years ago

@webknjaz I think this is a matter of text, correct? 778.24 is allowed to fail, and 778.23 is not (and the text shows this). And precisely because 778.23 failed but was not allowed to, the build 778 was marked a failure and 778.25 was cancelled automatically.

webknjaz commented 7 years ago

@BanzaiMan yes, correct, it's only about text. Just wanted to report this as it might be confusing to someone.

ghedamat commented 7 years ago

Hi,

When a job fails on a particular stage all subsequent stages are canceled, which is intended.

If I later manually retry that fail job and it succeeds the later stages have to also be manually retried. Is that intended as well or is there a way for me to get the following stages to run automatically after a previous one becomes green?

Apologies if this was already asked

Thanks a lot for the feature!

webknjaz commented 7 years ago

@ghedamat I've noticed this too. It would be interesting to know why it works this way.

alorma commented 7 years ago

Hi traviers!

It could be awesome if instead of just name a stage, like this:

captura de pantalla 2017-06-06 a las 13 18 05

We can see something like:

Check

277.1: Lint 277.2: PMD 277.3: FindBugs 277.4: Checkstyle

Deploy

alorma commented 7 years ago

HI.

This build from this PR is doing something weird:

PR: https://github.com/SchibstedSpain/Barista/pull/93 Build: https://travis-ci.org/SchibstedSpain/Barista/builds/239925305

The deploy state is doing ./gradlew build connectedCheck on this line: https://travis-ci.org/SchibstedSpain/Barista/jobs/239925311#L1628

But as you can see on the travis yml file: https://github.com/SchibstedSpain/Barista/blob/travis_build_stages/.travis.yml there's not such command.

What could happen?

cotsog commented 7 years ago

@alorma: Thanks for trying out our new Build Stages feature!

I believe this command is the default step for Android builds and it's executed since there isn't a script: key in your deploy stage.

You can add script: true to override the default step and do nothing instead.

I've created a PR to show you what I mean.

Please let us know if it does what you want.

alorma commented 7 years ago

@cotsog thaaaaaaanks! It's working now!

aontas commented 7 years ago

Is there a way of mixing matrix expansion and common config? Scenario: For my tests, I want to have a matrix of versions, set up (say) a postgres database then run the tests. For my deploy stage, I just want to have a single version (version override currently supported) but I have no need for a database to be set up.

I have tried (some config removed for brevity):


dist: trusty
addons:
  postgresql: '9.5'
services:
  - postgresql
before_script:
  - psql -c 'create database test_database;' -U postgres
node_js:
- '4.6'
- '6.1'
jobs:
  include:
    - stage: deploy
      script: skip
      node_js: '6.1'
      deploy:
        - provider: etc etc

This results in: test stage with matrix expansion. Postgres installed in both test and deploy stages.


dist: trusty
node_js:
- '4.6'
- '6.1'
jobs:
  include:
    - addons:
        postgresql: '9.5'
      services:
        - postgresql
      before_script:
        - psql -c 'create database test_database;' -U postgres
    - stage: deploy
      script: skip
      node_js: '6.1'
      deploy:
        - provider: etc etc

This results in 3 test stages (two for the matrix expansion, both of which fail as postgres not set up; one that includes postgress), and a deploy stage that does not include postgres.

I have tried matrix expansion inside the test stage, which (as commented above) does not work. While I can create two entire test config in the test stage, this is a lot of repeated config.

Is there something to help me do this sort of thing (ie, sharing config between test stage without putting that config in subsequent stages).

ELD commented 7 years ago

I'm toying around with Rust and build stages and it looks like it wraps the version (i.e. beta) in quotes which causes the Rust installer to fail because it can't find the version.

Here's the relevant portion of my .travis.yml:

jobs:
  rust:
    - stable
    - beta
    - nightly
  include:
    - stage: test
      script: cargo test -j8
      rust:
        - stable
        - beta
        - nightly
      allow_failures:
        - rust: nightly

    - stage: code_coverage
      script: bash ./ci/check_code_coverage.sh
      rust: stable

    - stage: release
      script: cargo build -j8 --release
      rust: stable

Here's the log output from Travis-CI

$ curl -sSf https://build.travis-ci.org/files/rustup-init.sh | sh -s -- --default-toolchain=$TRAVIS_RUST_VERSION -y
info: downloading installer
error: Found argument '"beta",' which wasn't expected, or isn't valid in this context
USAGE:
    rustup-init --default-toolchain <default-toolchain>
For more information try --help

You can pretty clearly see that the $TRAVIS_RUST_VERSION variable is being incorrectly wrapped in quotes.

Let me know how else I can help you debug/troubleshoot this issue.

BanzaiMan commented 7 years ago

@ELD You can't use matrix expansion inside jobs and jobs.include. https://docs.travis-ci.com/user/build-stages#Build-Stages-and-Build-Matrix-Expansion

ELD commented 7 years ago

@BanzaiMan Oh. That's quite unfortunate. I misunderstood that section and then thought I saw examples of .travis.yml files that used what I thought was matrix expansion in their jobs section.

I'm assuming there's no way to test the given tasks in the jobs stanza against the desired versions of Rust, then?

phadej commented 7 years ago

Does jobs support matrix. We (Haskell community) have a lot of .travis.yml files with explicit matrix section:


matrix:
  fast_finish: true
  include:
  - env: GHCVER=7.10.3 STACK_YAML=stack-lts-6.yaml
    addons:
      apt:
        sources:
        - hvr-ghc
        packages:
        - libfftw3-dev
        - alex-3.1.7
        - happy-1.19.5
        - ghc-7.10.3
        - libgmp-dev
  - env: GHCVER=8.0.2 STACK_YAML=stack-ghc-8.0.yaml
    addons:
      apt:
        sources:
        - hvr-ghc
        packages:
        - libfftw3-dev
        - alex-3.1.7
        - happy-1.19.5
        - ghc-7.10.3
        - ghc-8.0.2
        - libgmp-dev

We need to used explicit matrix to keep env and apt contents in sync. Adding jobs top-level definition doesn't work, neither work adding jobs per matrix entry like:

matrix:
  include:
  - jobs:
     - ...

Is this supported combination? I cannot find any example in the docs.

additional: it would be very very cool to have a (web) tool to where you can put your .travis.yml and see how it's massaged (matrix expansions etc), would help debugging very much.

23Skidoo commented 7 years ago

I tried a bunch of different things to enable stages in my existing configuration, but no matter what I do, it doesn't seem to work: https://github.com/haskell/cabal/pull/4560.

BanzaiMan commented 7 years ago

@23Skidoo Your .travis.yml has both matrix and jobs, and matrix takes precedence. (This needs to be explicitly stated in the docs.)

23Skidoo commented 7 years ago

@BanzaiMan Is there/will there be a way to combine matrix and build stages? The announcement said that "build stages work as expected when combined with build matrix expansion".

BanzaiMan commented 7 years ago

@23Skidoo "Matrix expansion" means creating a large build matrix with multiple values in matrix keys (env, ghc, etc.). That is not happening in your case. You are just adding jobs with matrix.include.

Depending on your goal, it might be sufficient to concatenate the matrix.include array with the jobs.include array. It is entirely up to you. If I am to guess, though, you'd want to replace the existing matrix.include with:

jobs:
  include:
    - stage: prepare cache
      script: if [ "x$SCRIPT" != "xmeta" ]; then ./travis-precache.sh; fi
      after_success: skip
    - stage: test
      env: GHCVER=8.0.2 SCRIPT=meta BUILDER=none
      os: linux
      sudo: required
    â‹® # rest of `matrix.include` jobs

I should mention, however, that the cache generated in the "prepare cache" stage is not going to be shared by the jobs in test stage, because the test jobs do not have the same env values which are used for the cache name computation.

23Skidoo commented 7 years ago

@BanzaiMan Thanks! What would be the best way to implement a "prepare cache" stage that'd actually populate a shared cache for each job in the test stage? Something like this:

jobs:
  include:
    # meta doesn't need a pre-populated cache
    - stage: test
      env: GHCVER=8.0.2 SCRIPT=meta BUILDER=none
      os: linux
      sudo: required
    # but everything else does
    - stage: prepare cache
      env: GHCVER=8.0.2 SCRIPT=script
      script: ./travis-precache.sh
      after_success: skip
    - stage: test
      env: GHCVER=8.0.2 SCRIPT=script
      os: linux
      sudo: required
    - stage: prepare cache
      env: GHCVER=8.2.1 SCRIPT=script
      script: ./travis-precache.sh
      after_success: skip
    - stage: test
      env: GHCVER=8.2.1 SCRIPT=script
      os: linux
      sudo: required
      # and so on, and so on

?

23Skidoo commented 7 years ago

@BanzaiMan BTW, any chance the build stages feature could be extended in the future to allow expressing a job graph with more parallelism? In my use case, test jobs don't actually need to wait for all precache jobs to complete, each test job only depends on at most one precache job that populates the cache for that configuration.

Another feature that'd be nice to have is the ability to have named caches so that a cache could be shared between several jobs with different env parameters. That'd allow splitting some long-running jobs into several chunks, exposing more parallelism.

simonvanderveldt commented 7 years ago

Is it possible to let Travis give status feedback to PRs on GitHub per stage or only for the whole build?

joshwiens commented 7 years ago

Small visual bug.

screen shot 2017-06-11 at 1 38 18 pm

That should be displaying NodeJS versions 4.3 & 6 respectively.

Note: This is a visual bug only, the build is execution with the node version above.

.travis.yml - https://github.com/webpack-contrib/i18n-webpack-plugin/blob/master/.travis.yml

BanzaiMan commented 7 years ago

@d3viant0ne Your .travis.yml specifies nodejs, which is deprecated. Please use node_js going forward.

joshwiens commented 7 years ago

Doh! I will admit that is 100% a cmd + c fail moment :)

Doesn't have any effect on the missing version in the UI btw.

envygeeks commented 7 years ago

It would be nice if test didn't always have to be the first stage. Sometimes we would like to split up Installing stuff into it's own stage that should fail on it's own, especially when building Docker images, where we update to the latest version of Docker, we want that stage to fail on it's own.

envygeeks commented 7 years ago

It would be nice if global matrixes were respected when it comes to env.

skeggse commented 7 years ago

@envygeeks: see here for an example of a build stage that precedes the default test stage.

envygeeks commented 7 years ago

@skeggse we did try that, it didn't work.

timofurrer commented 7 years ago

Looks great! I'd also love to see the conditional stages. I've seen that others already mentioned it. @svenfuchs has there anything been implemented in that direction that we could test?

An issue I have at the moment is that a failed build which is allowed to fail is not indicated as such on failure but the Job actually passed. See: https://travis-ci.org/timofurrer/w1thermsensor/builds/243258727

The .travis.yml looks something like this:

jobs:
    include:
        - stage: Raspbian integration tests
          python: 3.6
          install: make prepare-integration
          script: make integration-tests
          # we don't need this env variable
          env: W1THERMSENSOR_NO_KERNEL_MODULE=None
          deploy:
              provider: releases
              api_key:
                secure: key
              file: "*.deb"
              skip_cleanup: true
              on:
                tags: true
                repo: timofurrer/w1thermsensor
    allow_failures:
        - stage: Raspbian integration tests
SkySkimmer commented 7 years ago

At coq/coq#802 I have an example of using build stages and build artifacts (with an 3rd party storage provider) to avoid redoing build steps.

Context: Coq is a proof assistant, i.e. an interpreter for a language which builds proofs of mathematical theorems. Parts of this language are underspecified, so in order to reduce compatibility issues we test every PR on a number of 3rd party developments. This takes a while (https://travis-ci.org/coq/coq/builds/242472062 : Ran for 3 hrs 27 min, Total time 9 hrs 45 min 59 sec) so travis gets filled up (currently 20ish PRs waiting, and a bunch of builds of the merges got cancelled to speed things up).

Previous attempts: For every job we have to rebuild the interpreter and its stdlib. This takes around 10min per job, multiplied by over 20 jobs. I made a GitLab CI system which uses their build artifact feature to share these builds (eg https://gitlab.com/SkySkimmer/coq/pipelines/8974087/builds, note that build times are not comparable with travis as gitlab has unlimited parallel jobs but slower machines). Then I hacked the travis.yml to share builds through the cache. (https://travis-ci.org/SkySkimmer/coq/builds/234566593 : Ran for 1 hr 40 min 21 sec, Total time 6 hrs 12 min 28 sec. There are a few less jobs than more recent versions but it was ~30min clock time, ~2h CPU time speed up IIRC.) However this was unusable as in order to share the cache I needed to make environment variables the same, so we can't tell which build is which easily.

More recently I tried using mega.nz (free accounts with decent amount of storage and CLI with megatools) to store builds. It seems to work: https://travis-ci.org/SkySkimmer/coq/builds/243429238 Total time 5 hrs 39 min 33 sec (clock time is broken for whatever reason, reporting only 18min) (there are still some bugs to be ironed out, like that failing OSX build). However it is entirely useless, because the important and time-using builds are for PRs, which don't get login credentials. So it falls back to rebuilding everything every job, plus some overhead compared to if we didn't try to share builds.

seivan commented 7 years ago

Is there a way to name the scripts under a stage somehow?

screen shot 2017-06-16 at 13 15 42

So Instead the build number (still keep those) we could have a human friendly name.

I'm glancing to use environment variables, but I'm checking here first.

keradus commented 7 years ago

no way to do it, atleast not yet

molovo commented 7 years ago

This might be my configuration, but I'm seeing some strange ordering issues with my build stages. (See build 243822386)

Stages in .travis.yml:

node_js:
- 6
- 7
- 8
jobs:
  include:
  - stage: lint
    script: npm run lint
    node_js: 8
  - stage: test
    script: npm test
  - stage: coverage
    script: npm test && npm run coverage
    node_js: 8

Order of build:

screen shot 2017-06-16 at 21 58 14

Based on my config I'd expect the order to be like this:

Have I configured something wrong?