harness / gitness

Gitness is an Open Source developer platform with Source Control management, Continuous Integration and Continuous Delivery.
https://gitness.com
Apache License 2.0
32.09k stars 2.8k forks source link

Matrix Builds / Sub-Builds / Parallel Builds #6

Closed jpadilla closed 8 years ago

jpadilla commented 10 years ago

I'm wondering how would a .drone.yml would look if you'd want to test with multiple versions of Python, like for example, using Tox?

bradrydzewski commented 10 years ago

good question. what do you think the .drone.yml should look like?

Here are the use cases I'd like to cover with sub builds:

Here are some design questions:

Although I think we can draw some inspiration from Travis I do not want to just copy their approach as I think it would limit our capabilities. With Travis your build is heavily tied to a language:

language: go

With Drone your build is tied to a Docker image. The image defines the environment. This may seem like a nuance, but this is really important. Drone doesn't care about language. Drone will never dictate which languages you can or cannot use. We need a yaml file that is re-imaged for Docker.

I consider this a high priority feature. Hopefully we can get a discussion started here and come up with some options.

jpadilla commented 10 years ago

@bradrydzewski great use cases here. Wasn't even thinking about those. One thing that comes to mind is having Docker images that contain all possible versions. For example, A Python Docker image that contains 2.6-3.3 and pypy. User could reference which version to use:

script:
  - pip2.7 install -r requirements.txt
  - python2.7 setup.py test
  - pip3.3 install -r requirements.txt
  - python3.3 setup.py test

I'm pretty sure this isn't the best way. You mentioned sub builds and that got me thinking about:

image: base-image
env:
  - DEBUG=true
builds:
    build:
        image: python
        env:
            - SECRET_KEY=123
        script:
          - pip install -r requirements.txt
          - python setup.py test
        services:
          - redis
notify:
  email:
    recipients:
      - brad@drone.io
      - burke@drone.io

But building matrixes, like on Travis, with this will possibly end up with a massive .drone.yml. Travis makes that pretty cool, I just set the versions of the language and additional environment variables. Every environment variable items in the env array trigger individual builds. I think we might be able to find a way to do that with Drone's philosophy.

image:
    name: python
    config:
        versions:
            - 2.7
            - 3.3
env:
  - DEBUG=true SECRET_KEY=123
script:
  - pip install -r requirements.txt
  - python setup.py test
services:
  - redis
        config:
            versions:
                - 2.6
                - 2.8
notify:
  email:
    recipients:
      - brad@drone.io
      - burke@drone.io

This example would trigger 4 sub builds one for each version of python with each version of the redis service. Hope this is somewhat useful.

bradrydzewski commented 10 years ago

I really like your suggestion. Instead of "versions" we could call them "tags" which is consistent with the docker terminology:

image:
    name: python
    tags:
      - 2.7
      - 3.3

Do you have any suggestions for a notation that would split a build into parallel tasks? For example, I only want to test against Python2.7, but my tests take a long time, so I want to break them up into suites and run in parallel.

We anticipated this change, so our database already supports sub-builds / matrix builds. The real challenge here is the yaml :)

wilmoore commented 10 years ago

Good discussion. Just a suggestion though: perhaps, rename the issue because this problem isn't exclusively a python concern. This issue is pertinent to other environments such as NodeJS, Ruby, Erlang/Elixir, etc.

Also, :+1: - with Travis (as much as I love it), trying to build https://github.com/exercism/exercism.io/blob/master/.travis.yml is difficult. You have to circumvent the magic with multiple bootstrap files or the equivalent.

By leaning on docker images (or dockerfiles), seems like single-language builds would be less magical in general and multi-language builds would be less obtuse.

electrical commented 10 years ago

Hi all.

Got directed to this thread by @bradrydzewski giving my £ 0.02 I think the travis matrix would be a good starting point to begin with. ( see https://github.com/elasticsearch/puppet-elasticsearch/blob/master/.travis.yml as example )

Most important parts i think are :

For deployments its hard to choose when its allowed to do it ( when a certain test passed or all of them )

That's all i can think of at the moment.

ewr commented 10 years ago

For another example of a tool that is thinking along these lines, check out Test Kitchen's platforms and suites:

http://kitchen.ci/docs/getting-started/adding-platform http://kitchen.ci/docs/getting-started/adding-suite

While Kitchen is really thinking in terms of OS versions, the issue here in terms of Rubies or Pythons is really the same thing a level up the stack.

benallard commented 10 years ago

I don't think the ability to paralellize one build is linked to this issue. As if you want to do that, you need to define (independant) sub-units of your build, which is pretty orthogonal to the idea of running the build multiple time on different environments ...

Try not to bloat this issue to much by adding every possible future feature to it. I think it's better to keep busy on one aspect at a time.

I think you came up pretty far there by defining to more aspect there: We need the ability to select different 'tags' of an image, and/or the ability to select different version of a service, and/or the ability to run the build with a different set of environments variables, and so on ...

I believe all of them could be implemented independently from each other ...

bradrydzewski commented 10 years ago

@benallard I definitely agree

I had a great discussion with an Ops lead that suggested adding a matrix section, where the axis could be defined. What does everything this of this proposal?

image: python:$$python_version
env:
  - DEBUG=true
  - SECRET_KEY=123
  - DJANGO=$$django_version
script:
  - pip install -r requirements.txt
  - python setup.py test
services:
  - redis:$$redis_version

matrix:
  python_version:
    - 2.7
    - 3.2
  redis_version:
    - 2.6
    - 2.8
  django_version:
    - 3.0
    - 4.0

this would end up producing 8 different sub builds. I think it is probably the most flexible design, but I'd love to hear what others think.

note that the matrix parameters should be handled in a similar manner to private environment variables. They can be injected directly in the script (using find / replace) using the $$ convention. They would also be injected directly into the build as environment variables.

electrical commented 10 years ago

Looks good to me @bradrydzewski :-) i would go for that.

jpadilla commented 10 years ago

@bradrydzewski that's pretty interesting right there. It took me a moment to figure out that the matrix defines the variables and its values, but I think it definitely covers all the cases we previously discussed.

mdshw5 commented 10 years ago

Looks perfect to me. This is the only features keeping me from using drone right now!

On Feb 25, 2014, at 4:20 PM, Brad Rydzewski notifications@github.com wrote:

@benallard I definitely agree

I had a great discussion with an Ops lead that suggested adding a matrix section, where the axis could be defined. What does everything this of this proposal?

image: python:{{ python_version }} env:

  • DEBUG=true
  • SECRET_KEY=123
  • DJANGO={{ django_version }} script:
  • pip install -r requirements.txt
  • python setup.py test services:
  • redis:{{ redis_version }}

matrix: python_version:

  • 2.7
  • 3.2 redis_version:
  • 2.6
  • 2.8 django_version:
  • 3.0
  • 4.0 this would end up producing 8 different sub builds. I think it is probably the most flexible design, but I'd love to hear what others think.

— Reply to this email directly or view it on GitHub.

electrical commented 10 years ago

An other important addition is to be able to tell which combo's are allowed to fail. In my case im running multiple Puppet versions against different Ruby versions. Some earlier puppet versions don't work against Ruby 2.0.0 and fail.

I was thinking of the following:

Allow all ruby 2.0.0 cases to fail:

allowed_fail:
    ruby:
      - 2.0.0

Allow Ruby 2.0.0 with Puppet 2.7.0 or 3.0.0 to fail

allow_fail:
  ruby_version:
    - 2.0.0
    puppet_version:
      - 2.7.0
      - 3.0.0

Any thoughts about it?

benallard commented 10 years ago

You should pay attention not mixing a notification issue with a fundamental architecture one ...

Do you don't want those test to run, or do you just don't care about their result ? If the former, this should be analysed there, if the later, we should figure out later about the right way to perform this.

Anyway, to extend on your idea, it should be possible to define sub-matrices where the build should not be performed.

I suggest the following syntax:

matrix:
  python_version:
    - 2.7
    - 3.2
  redis_version:
    - 2.6
    - 2.8
  django_version:
    - 3.0
    - 4.0
  except:
    -
      python_version: [2.7]
      django_version: [3.0, 4.0]
    -
      redis_version: [2.6]
      django_version: [4.0]

This would run all the builds except the 6 excluded ones ...

gonzojive commented 10 years ago

This was merged with #159, so I'm continuing discussion here. My use case is as follows:

It seems like the proposals so far don't solve the problem of having multiple projects per git repository. They do deal with the problem of multiple builds per project.

Personally I'm skeptical the ideal solution involves sticking purely with yaml files. As a user, I'd prefer if Drone stayed out of the way as much as possible and allowed me to script the configuration if I wanted:

for python_version in [2.7, 3.2]:
  for redis_version in [2.6, 2.8]:
    for django_version in [3.0, 4.0]:
      addBuild(
         variantName = "py=%f, redis=%f, django=%f" %
           (python_version, redis_version, django_version),
         image = "....")

Once again, this does not solve the multiple-projects-per-repository problem (maybe we should continue that discussion in #159), but hopefully it helps with the discussion at hand.

mdshw5 commented 10 years ago

@gonzojive I don't think this works too well because then you are just picking a scripting language (looks like Python) which creates two problems:

  1. You shouldn't use code to describe data when the actual data (a markup file) will do fine.
  2. Non-Python users will be left wondering why Python was chosen over their favorite language.

As a side note, if this were a viable solution then something other than nested for-loops would be more readable:

python_version = (2.7, 3.2)
redis_version = (2.6, 2.8)
django_version = (3.0, 4.0)
for python, redis, django in itertools.product(python_version, redis_version, django_version):
  add_build(python, redis, django)
gonzojive commented 10 years ago

I think you can stick with a configuration language, but it'd be good to allow scripting if desired. You can do it in a way that leaves the choice of programming language up to the user.

Perhaps the .drone.yml file can include a line like

drone-config-script:
- ruby gen-build-plan.rb > $OUT
bradrydzewski commented 10 years ago

@gonzojive this is definitely a more advanced use case you are proposing. This project is still very young (0.1 alpha release) and the immediate focus is on the more simple use cases that serve 80+% of users. I'm happy to revisit this request in a few months once the project is further along.

justone commented 10 years ago

How about the use case of builds with separate deployments? For instance, it might be convenient to keep a project's source and website in the same repo. Then, when the build happens, the website is built and deployed to its server and the source is compiled and that result is uploaded to s3.

fudanchii commented 10 years ago

@justone If both of the source and the website sit on the same branch you can use deploy and publish plugin together in a single .drone.yml file.

justone commented 10 years ago

@fudanchii Interesting idea. I didn't realize you could have a deploy and a publish in the same file. I wonder if it's possible to have multiple deploys of different types in the same .drone.yml file. Like a git and an ssh deploy, each going to different places. I suppose that if #201 is merged, you can have one bash deploy that sends application artifacts to one location and the generated website to another.

If multiple builds can be specified, I think it would be good for there to be an environment variable injected into the build so that any deploy or publish can know which one it's working on.

Linuturk commented 10 years ago

Wanted to add my +1 to this. I'd like to have a situation where I can define multiple images (Ubuntu versions) to test my software. Something like this makes sense to me:

image:
  - ubuntu:14.04
  - ubuntu:12.04
bradrydzewski commented 10 years ago

@justone yes you can have multiple deployment entries in the yaml (ie ssh and git). We loop through each entry and execute.

drewvanstone commented 10 years ago

@bradrydzewski Adding my +1 to this. I'd also like to see this support parallelization. I think it would be a subsection to 'script', where you define which container to run a test in. I've modified your example above to illustrate it:

image: python:{{ python_version }}
env:
  - DEBUG=true
  - SECRET_KEY=123
  - DJANGO={{ django_version }}
script:
  container1:
    - pip install -r requirements.txt
    - python setup.py test
  container2:
    - pip install -r requirements.txt
    - python setup.py test
services:
  - redis:{{ redis_version }}
matrix:
  python_version:
    - 2.7
    - 3.2
  redis_version:
    - 2.6
    - 2.8
  django_version:
    - 3.0
    - 4.0

Love to hear other's thoughts on this too.

Linuturk commented 10 years ago

I was speaking with someone yesterday about this, and he suggested we might approach this with multiple YAML docs in a single .drone.yml

Something like this:

---
image: mischief/docker-golang
env:
  - GOPATH=/var/cache/drone
script:
  - go build
  - go test -v
services:
  - redis
notify:
  email:
    recipients:
      - brad@drone.io
      - burke@drone.io
---
image: mischief/docker-golang
env:
  - GOPATH=/var/cache/drone
script:
  - go build
  - go test -v
services:
  - redis
notify:
  email:
    recipients:
      - brad@drone.io
      - burke@drone.io

Obviously, all the options could be different between the two docs, and it would probably be easier to implement a second build using the existing code rather than restructure into a matrix style.

drewvanstone commented 10 years ago

@Linuturk I like that duplicating gives you more flexibility, but I feel 90% of use cases would just be duplicate configuration. For instance, if I wanted to run the build in 5 containers, I now have 5 portions of the YAML file where only the script section changes.

Linuturk commented 10 years ago

Maybe we have the secondary documents inherit all the values of the previous document, except the newly defined values in the secondary documents.

To be clear:

---
image: ubuntu
script:
  - go build
  - go test -v
services:
  - redis
notify:
  email:
    recipients:
      - brad@drone.io
      - burke@drone.io
---
image: rhel
notify:
  email:
    recipients:
      - joe@drone.io
0xcaff commented 10 years ago

@bradrydzewski, In your first comment, you stated:

Although I think we can draw some inspiration from Travis I do not want to just copy their approach as I think it would limit our capabilities. With Travis your build is heavily tied to a language:

language: go

When you specify a language with Travis, all that changes it the default build steps. The environment is more or less consistent across all builds.

bmorton commented 10 years ago

For parallelization, could you just set the number of nodes you want to use and pass an environment variable into each and let the script deal with how to parallelize it? Use this config:

image: bmorton/ruby-2.1.2
nodes: 20
script:
  - bundle install
  - bundle exec rake ci
services:
  - postgres

And then each node gets run with the respective ENV vars passed in: DRONE_TOTAL_NODES=20 and DRONE_NODE=1.

From there, you'd just need a way to aggregate the results.

drewvanstone commented 10 years ago

I like this idea, but each node would also need to be able to run a different script. My example wasn't clear on that. But one container might execute rspec tests and another might execute the jasmine tests, for example.

bmorton commented 10 years ago

Yeah, I think for some people, there will be multiple build steps and you want those multiple steps parallelized. For others, you'll want to parallelize a single build step for test suites that take a long time to run serially.

drewvanstone commented 10 years ago

Fair enough. That would work for my use case.

bmorton commented 10 years ago

Maybe even something like this? This would allow you to outline different tasks you want run and if you want to run any given task on more than 1 node. Further, you could specify wait or depends_on_previous or something if you want the next task to wait for the previous task to complete.

image: bmorton/ruby-2.1.2
tasks:
  - nodes: 20
    script:
      - bundle install
      - bundle exec rake ci
  - wait: false
    script:
      - bundle install
      - ruby test/xunit/runner.rb
services:
  - postgres
beefsack commented 10 years ago

Currently have a use case which I'd love to use Drone for, the fit and integration with Docker and GitHub are ideal, but being held up with the lack of multi environment builds.

Would love to see a feature like this sometime soon.

bradrydzewski commented 10 years ago

@beefsack can you provide some more details around your use case?

beefsack commented 10 years ago

@bradrydzewski, currently building a CI system for Unreal Engine builds under different Linux distributions. There would multiple Docker images, one for each distribution with the relevant build dependencies installed, and I'd run the build commands inside each image to see which distributions successfully build.

Having Docker images is perfect for running under multiple distributions, and the GitHub integration is great because Unreal Engine is already on GitHub (behind a pay wall).

bradrydzewski commented 10 years ago

Thanks. This is definitely the goal, to be able to test a single commit against multiple Docker images. There are some pre-requisite features, namely #162 that we'll need to implement first. We'll also need to figure this out:

  • How do the above use cases impact deployments?
  • If we have 3 sub builds which one should be responsible for executing the deployment?
  • If we have 1 sub builds responsible for deployment, should it wait for the other 2 to pass?
  • How should this be represented in the yaml
beefsack commented 10 years ago

@bradrydzewski, it makes most sense for those to be configurable I think, in my case the following would be ideal (but not necessary to my adoption):

jloh commented 10 years ago

Is the ability to specify multiple images available yet? Or, if not that, multiple YAML docs?

drewvanstone commented 10 years ago

Any update on this? It's the only blocking issue for our company switching over to drone.io.

steve-salmond commented 10 years ago

Also very keen on some kind of multi/sub-build capability. I have a number of build products I'd like to generate from one repo, and they are all quite different. Arguably the repo should just be split into components, but it would be very convenient if Drone supported multiple YAML files per repo. What if you could optionally specify the name of the .yml file to use when defining a build in the Drone web UI, and default to .drone.yml if none is given?

Alternatively, sub-build files could be declared in the YAML itself (apologies if this is invalid syntax):

builds:
  - drone/client.yml
  - drone/server.yml

Drone would execute the main .drone.yml, discover these sub-builds, then execute them after whatever else was in the main build file (possibly nothing). This could potentially be a recursive process if the sub-build YAML files also had builds sections.

Not too sure how this fits in with the matrix proposal. Perhaps the YAML could support both, and the matrix section, if present, would apply only to the .yml file it appeared in. Anyway, great work on Drone - it's a pleasure to work with so far!

kenberland commented 10 years ago

Can I help with this or #162 ?

bradrydzewski commented 10 years ago

@kenberland yes, I replied on #162

benben commented 9 years ago

hi partypeople! I totally want to see this in drone! Do you have an ETA on this? Whats the plan here? thanks!

TheNeikos commented 9 years ago

:+1: For an ETA

bradrydzewski commented 9 years ago

The plan is to get 0.3 released (and the exp branch merged into master) as soon as humanly possible. Once 0.3 is released I'd like to focus on matrix builds and pipelines, see https://github.com/drone/drone/issues/470#issuecomment-56309626

This will be important because these two features will conflict. Matrix builds will complicate pipelines and vice versa. Both are really important and probably need to be architected in parallel.

benben commented 9 years ago

great! let me know if I can help in any case. I'm not very familiar with go but probably there is other stuff to get this going fast.

bjodah commented 9 years ago

Thought I should share my 2 cents:

Many of my build scripts spend the longest time setting up the environment. Drone + docker could easily circumvent that by letting the image be the matrix-like parameter:

image:
  - bjodah/trusty-python2
  - bjodah/trusty-python3

(EDIT:) or maybe something like:

matrix:
  - image=ubuntu:precise GCC_VERSION=4.6
  - image=ubuntu:precise GCC_VERSION=4.8
  - image=ubuntu:trusty GCC_VERSION=4.8
  - image=ubuntu:trusty GCC_VERSION=4.9

where image would be interpreted specially but GCC_VERSION is just an environment variable

JeanMertz commented 9 years ago

For parallelization, could you just set the number of nodes you want to use and pass an environment variable into each and let the script deal with how to parallelize it? Use this config:

image: bmorton/ruby-2.1.2 nodes: 20 script:

  • bundle install
  • bundle exec rake ci services:
  • postgres And then each node gets run with the respective ENV vars passed in: DRONE_TOTAL_NODES=20 and DRONE_NODE=1.

From there, you'd just need a way to aggregate the results.

This seems like a really nice idea, combined with something like https://github.com/ArturT/knapsack

kenberland commented 9 years ago

We couldn't wait and rolled another solution. It's based on Integrity and leverages CoreOS for parallelism. We're elastically scaling workers into AWS to supplement the workers we have on site.

drahnr commented 9 years ago

Did anybody step up to the plate yet? (Just asking to prevent duplicate work.) If not, I will put this on my plate round about 2 weeks from now (going for the matrix section style).