weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Epic - Improve development & testing infrastructure #2647

Closed marccarre closed 7 years ago

marccarre commented 7 years ago

Summary

Re-factor the current smoke tests in order to:

This should hopefully:

+------------------+ +-----------------+ +-----------------------+ +---------------------+
| make stress-test | | make cross-test | | make integration-test | | Weave Net developer |
+-------+----------+ +--------+--------+ +--------+--------------+ +--+--------------+---+
        |                     |                   |                   |              |
        | +-------------------+-----------------+ |                   |              |
        | | Cartesian-product of deps. versions | |                   |              |
        | +-------------------+-----------------+ |                   |              |
        |                      \                  |                   |              |
+-------+------+                \        +--------+----------+        |              |
| Stress Tests |                 `-------+ Integration Tests |        |              |
+-------+------+                         +--------+----------+        |              |
        |                                         |                   |              |
+-------+-----------------------------------------+-------------------+---+          |
|          Configuration (Go, Docker, K8S, etc. via Ansible)              |          |
+-------+-----------------------------------------+-------------------+---+          |
        |                                         |                   |              |
+-------+-----------------------------------------+-------------------+--------------+---+
|                Provisioning (VMs via Vagrant, AWS/GCP/DO via Terraform)                |
+----------------------------------------------------------------------------------------+

Progress

Details

Current state

Weave Net is currently integration-tested in two main ways:

Software used during the build & testing currently is:

Limitations & Potential Issues

  1. We only test against one version of Docker, which will not flag issues with other versions used in Production by our users.
  2. We do not systematically test against Docker release candidates -- and when we do, do it manually -- so cannot consistently react to or feed back issues before Docker releases.
  3. We only test against one Operating System (Ubuntu), which will not flag issues with other ones used in Production by our users.
  4. We only test against one version of Ubuntu, which will not flag issues with other ones used in Production by our users.
  5. We only test against one cloud provider (GCP), which will not flag issues with other ones used in Production by our users.
  6. Some of these versions are hardcoded, while some others depend on what is available in the OS' package manager at the time of the build which may change over time, i.e. their version is a "moving target".
  7. Vagrant and GCP are not using the same versions of Docker and Ubuntu, which could cause issues.
  8. When using Vagrant, different versions of Docker are used, which could cause issues.
  9. It is very manual to set up testing environments for arbitrary configurations, which can be useful to reproduce various issues quickly.

Proposed Target State

  1. Decouple dependencies definition, provisioning, configuration and testing: i. It should be easy to list all available versions of software we rely upon (OS, Docker, Go, etc.). ii. It should be easy to define a combination of these versions to use for development/testing. iii. It should be easy to provision a local or remote environment. In the case of remote environments, it should be easy to leverage any cloud provider. iv. It should be easy to configure the environment created in 1.iii. using the versions defined in 1.ii. v. It should be easy to then perform any kind of testing on this environment: manual, automated, integration tests, cross-version tests, stress tests, etc.

  2. "Dependency-inject" versions to the build: OS, Docker and Go versions should all be configurable in one single place, and then passed in a "top-down" fashion to the provisioning and configuration layers and NOT hardcoded anywhere else below (other layers, in actual tests, etc.).

  3. Testing: i. Integration testing should simply be one instance of 1.ii/iii/iv., with, e.g. the latest versions, or a hardcoded combination of versions of all dependencies. This would continue to be run on every git push. ii. Cross-version testing: it should be easy to repeat 3.i. for multiple combinations of versions, or even all combinations (cartesian product of versions of all dependencies), hence making it easy to:

    • spot incompatibilities between versions
    • document compatible versions (e.g. via a "compatibility matrix")

    This would typically be run weekly and/or manually before releases.

    iii. Cross-version testing should ideally be done concurrently, for faster feedback. iv. Stress testing & more advanced testing scenarios could also re-use 1.ii/iii/iv.

Proposed Implementation

  1. Dependencies listing: a set of shell scripts would list all available versions of all dependencies (OSes/Kernels, Go, Docker, K8S, etc.).
  2. Dependencies definition: make would, depending on the target, either: i. call 1. and select versions using pre-defined rules (e.g. latest production version, latest RC version, etc.) ii. use hardcoded versions.
  3. Provisioning: i. Locally: a set of Vagrantfiles (see Vagrant) could be used to provision local VMs. ii. Remotely: a set of .tf (see Terraform) files could be used to provision machines from AWS, GCP and Digital Ocean. Docker Infrakit may also be a good tool to implement the above -- see also this article
  4. Configuration: a set of Ansible scripts, as potentially more portable and idempotent than bash scripts, could be used to automatically configure the newly provisioned environments.
rade commented 7 years ago

Does this replace #229?

rade commented 7 years ago

Does this encompass #1747?

rade commented 7 years ago

Does this replace #2018?

rade commented 7 years ago

Does this encompass #2620?

marccarre commented 7 years ago

Let's keep #1747, #2018 and #2620 as an individual issues though. #2647 is more of an "epic" or meta-issue.

marccarre commented 7 years ago

Next: cross-version testing.

Note that cross-version testing of Kubernetes depends on kubernetes/release/issues/234.

brb commented 7 years ago

Fixed by #2694