Repository Addition Request: rust-vmm-ci

andreeaflorescu commented 5 years ago

Repository Name: buildkite-ci

Short Description

The buikdkite-ci repository contains the buildkite pipeline that is going to be used by all other crates to run the CI. The buildkite pipeline is an yaml file with steps. Each step corresponds to one command running as part of the CI.

With buildkite you can either configure the steps in the web interface or using a .buildkite/pipeline.yaml file in your repository. The proposal is to have this repository as a git submodule for all crates. In the buildkite configuration for each crate we would then only need to specify the pipeline upload path which would end up being something like buildkite-ci/pipeline.yaml. For details about configuring the pipeline path, please check this article. Should we name the submodule common-buildkite-ci since we are most likely going to have another pipeline per repository?

Why is this repository relevant to the rust-vmm project?

This is needed so we do not have to duplicate the pipeline for all crates. All crates will end up using the same pipeline for the CI and implicitly we will have the same quality expectations from all crates. If some repository needs additional, specialized tests these can be added in a different, per crate configuration file (pipeline).

petrutlucian94 commented 5 years ago

Having a centralized place for those pipeline definitions sounds great, we avoid duplication and simplify CI management.

I guess that in the future, the test scenarios may become more complex, so this repo may hold other files as well (e.g. other job definition files, scripts, simple tools, etc).

andreeaflorescu commented 5 years ago

Renamed the proposed named to rust-vmm-ci as I was thinking to use it for other common tests as well. For example we can have test_coverage.py there so other projects can reuse it.

andreeaflorescu commented 5 years ago

Additional proposal: we can have one pipeline for each platform and operating system we want to support. And let me tell you why:

Some crates (like vm-memory) will also have support for Windows so you want to the integration tests to pass on windows as well. But you can't have it in the default pipeline because some crates cannot have it (like kvm-ioctls).
Ideally we would have most of the crates work on arm and x86. That means that in our pipelines we would have same tests on x86 and arm as well. But some tools are not available on arm, so the steps are not exactly the same. Examples of things that aren't available (yet) on arm are clippy and kcov. Now, when we start development we might not want to add support from both arm and x86 from the first PR because it might turn out to be very complex and hard to review. So we can start with x86 and only use the x86 pipeline till support for arm is added.

With our current configuration of crates, platforms and operating systems we would end up having 3 pipelines:

x86_linux_pipeline.yml (used by kvm-ioctls, vm-memory, vmm-sys-utils)
x86_windows_pipeline.yml (used by vm-memory)
arm_linux_pipeline.yml (used by kvm-ioctls, vmm-sys-utils)

rbradford commented 5 years ago

Some repos might need different pipelines, perhaps to prepare something: https://github.com/rust-vmm/linux-loader/pull/2#discussion_r282446524

For this specific example it might make sense to just have a ci-prepare.sh in that repository and then have the common script [test -a ci-prepare.sh] && ci-prepare.sh

andreeaflorescu commented 5 years ago

@rbradford you can have custom hooks executed before running a command or before checking out the repository code. I think the prepare.sh script should be called in the hook and not part of the pipeline step so that the steps can be independent of the repository being tested. What do you say?

If you want to learn more about hooks you can check the Buildkite hooks documentation.

andreeaflorescu commented 5 years ago

This is a summary of the discussion points from the email thread. @sboeuf @petrutlucian94 please correct me if there is anything I misunderstood.

Sharing the CI pipelines or other related tools

There are currently two proposals:

use git submodules
script that pulls the latest CI. @sboeuf : "You could have a dedicated "ci" directory (for every crate) in which you could have the same script that pulls the content of the "ci" repo before to call into one pipeline or another (that's the part specific to each crate)."

Webhooks for the CI

Also two proposals:

one webhook per repository
one webhook/pipeline

Parallelizing the Jobs

@petrutlucian94 "each buildkite agent runs one step/job at a time, so I think we should run multiple agents per host (e.g. by passing “—spawn ”). One idea would be to just run one agent per host cpu core."

andreeaflorescu commented 5 years ago

* script that pulls the latest CI. @sboeuf : "You could have a dedicated "ci" directory (for every crate) in which you could have the same script that pulls the content of the "ci" repo before to call into one pipeline or another (that's the part specific to each crate)."

@sboeuf I don't see a big advantage here with using this instead of regular git submodule. I like having a fixed version for the pipelines because this implies you are only updating to a new version of the CI intentionally. If you are in the middle of a release or in the middle of a bug fix you don't want the CI to break for something that is unrelated to what you are doing.

I don't necessarily want to use submodules either, it just felt like the most natural way of consuming the CI.

Webhooks for the CI

Also two proposals:
* one webhook per repository

* one webhook/pipeline

@petrutlucian94 the steps can run in parallel without having multiple steps. So the only thing that would be serial here is the download of the pipelines which should be sub-second. For simplicity I would keep it as one webhook/repository until we have a good reason to complicate ourselves with multiple webhooks. From my understanding, having multiple webhooks also means having multiple Buildkite projects because one project supports only one webhook. By project I mean kvm-ioctl-ci which where we define the pipeline. Any other reason for having multiple webhooks other than speed of CI?

Parallelizing the Jobs

@petrutlucian94 "each buildkite agent runs one step/job at a time, so I think we should run multiple agents per host (e.g. by passing “—spawn ”). One idea would be to just run one agent per host cpu core."

Right now the load is very low. Once we add more repositories and the load increases we can scale up the number of agents. I wouldn't necessarily have one/core because it's a waste of resources since they aren't actually going to do anything most of the time. Maybe we can use auto-scaling for this particular issue.

petrutlucian94 commented 5 years ago

By project I mean kvm-ioctl-ci which where we define the pipeline.

Probably we got the terms mixed up. So kvm-ioctl-ci itself is a buildkite pipeline, by splitting it into multiple pipelines I thought we'd end up with something like kvm-ioctl-ci.arm_linux, kvm-ioctl-ci.x86_linux, etc, each requiring its own webhook.

So the only thing that would be serial here is the download of the pipelines which should be sub-second.

From what I can tell, those build steps wait for agents to become available. Each agent runs one step at a time and there's a single agent per platform defined at the moment. For example, as we can see here https://buildkite.com/rust-vmm/kvm-ioctls-ci/builds/198, "check-warnings-arm" ran in 20s and waited 1m 29s. It's really not that much, but if we get projects with time consuming tests (e.g. that may take more than 10-15 minutes) and multiple simultaneous builds (maybe for different crates), this becomes a bottlenck, while the CI host remains underutilized.

I did a sandbox Windows CI for vm-memory and the test duration reduced from 6m31s to 2m18s after simply running more instances of the same agent on the same host (I have just a 6 core vm atm): https://buildkite.com/lpetrut/vm-memory-ci/builds/54 https://buildkite.com/lpetrut/vm-memory-ci/builds/53.

I wouldn't necessarily have one/core because it's a waste of resources since they aren't actually going to do anything most of the time.

If there are no pending tasks, those agents will do nothing (other than running a heartbeat), consuming a negligible amount of resources. Ofc one agent per core would be too much when having 32 logical cores though :). Anyway, that was just a suggestion.

rbradford commented 5 years ago

@rbradford you can have custom hooks executed before running a command or before checking out the repository code. I think the prepare.sh script should be called in the hook and not part of the pipeline step so that the steps can be independent of the repository being tested. What do you say?

And that's still going to work if everything is centralised into a common repo? If so, sounds good!

sboeuf commented 5 years ago

@andreeaflorescu

I like having a fixed version for the pipelines because this implies you are only updating to a new version of the CI intentionally. If you are in the middle of a release or in the middle of a bug fix you don't want the CI to break for something that is unrelated to what you are doing.

Fair enough, let's use git submodules then!

andreeaflorescu commented 5 years ago

@rbradford the hooks are on the machines where the builkite-agent is running. So yes. In the hook you have information about the repository being tested so you can have commands that are executed only for some of them.

andreeaflorescu commented 5 years ago

Probably we got the terms mixed up. So kvm-ioctl-ci itself is a buildkite pipeline, by splitting it into multiple pipelines I thought we'd end up with something like kvm-ioctl-ci.arm_linux, kvm-ioctl-ci.x86_linux, etc, each requiring its own webhook.

@petrutlucian94 I am not sure if this is how it works and I will have to check. My proposal would be to have one pipeline/project with one associated webhook. This is what kvm-ioctls-ci is today. Then, in kvm-ioctls we would have 2 steps:

one for uploading x86_linux_pipeline.yaml
one for uploading arm_linux_pipeline.yaml

The two pipelines definitions would have the tests specific for each platform, but they will both be triggered by the same webhook. We can add multiple agents so the tests are run in parallel. Does that make sense?

From what I can tell, those build steps wait for agents to become available. Each agent runs one step at a time and there's a single agent per platform defined at the moment. For example, as we can see here https://buildkite.com/rust-vmm/kvm-ioctls-ci/builds/198, "check-warnings-arm" ran in 20s and waited 1m 29s. It's really not that much, but if we get projects with time consuming tests (e.g. that may take more than 10-15 minutes) and multiple simultaneous builds (maybe for different crates), this becomes a bottlenck, while the CI host remains underutilized.

Yap, I agree. Once we have multiple repository and those repositories have multiple PRs and so we can add more agents.

sameo commented 5 years ago

@rbradford the hooks are on the machines where the builkite-agent is running.

AFAIU that's for the agent hooks. There are also repository scripts that live in the .buildkite/hooks/, I think this is what we'd want in that case.

andreeaflorescu commented 5 years ago

@sameo yap, we can have per repository hooks. I missed that one!

For future reference: https://buildkite.com/docs/agent/v3/hooks#repository-hooks

sameo commented 5 years ago

@andreeaflorescu Right, but how would we do that if each repo's .buildkite is a git submodule?

sameo commented 5 years ago

@andreeaflorescu Let's go an create the repo anyway. We'll find the best way to use it across repos once it's there and filled.

andreeaflorescu commented 5 years ago

@sameo as discussed offline, I would have rust-vmm-ci as a git submodule in every repository. The rust-vmm-ci directory will have all the things that are shareable by all repository like a common-pipeline.yml and the coverage test. To run the common pipeline as part of the CI, you would need to add a step like: buildkite-agent pipeline upload rust-vmm-ci/common-pipeline.yml

Then each repository will have a .buildkite folder which contains custom tests in custom-pipeline.yml and a .buildkite/hooks/ which contains the custom per repository hooks. To run the custom pipeline, you need to add an additional step to buildkite CI: buildkite-agent pipeline upload .buildkite/custom-pipeline.yml

If the crate doesn't need something fancy for the hooks I think we can use the defaults which will be hosted on the machine that's running the agent. I would expect the default hooks to be overwritten by the ones in .buildkite/hooks.

I haven't tried out any of these things though, I really hope this is how it works :))

andreeaflorescu commented 5 years ago

Created the repo: https://github.com/rust-vmm/rust-vmm-ci