vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.19k stars 2.06k forks source link

Improving CI/CD test coverage #4989

Open rafael opened 4 years ago

rafael commented 4 years ago

Expanding Vitess Test Coverage

Currently Vitess has a rich set of functional tests that are run as part of every commit to catch regressions early. However, they are not sufficient to assess the quality of the product for production rollout.

Some of the problems with the current approach are:

Proposed Testing environments:

Continuous Integration

Frequency: For every commit Scope: Unit tests and most functionality/integration testing

Currently most tests are run in Travis CI. As tests are converted to Golang, we will move them to use GitHub Actions.

Known Tasks:

We will try and run as many tests in the Continuous Integration suite as possible. GitHub actions allow 20 concurrent tests to be run, we may be able to upgrade for more.

Upgrade/Downgrade Testing

Frequency: Nightly Scope: Reliability, Backward/Forward compatibility

We currently do not test for incompatibilities introduced via upgrade, and ensuring that users can downgrade one level if they need to backout of a failed upgrade. The scenarios will need to be written down, and then tests can be written using GitHub actions:

Production Readiness

Frequency: Pick a new build every 2 weeks Scope: Performance, Stress

Longer term we should track regressions in performance as part of automated testing. I suggest we scope this out after we have started upgrade/downgrade tests, as we might learn which scenarios we would like to test against. I am a bit nervous of skew, since as well as being virtualized, there are no promises of exactly what hardware GitHub actions is providing. We may be best served by using physical hardware.

morgo commented 4 years ago

I like it! I just have one suggestion: under regression testing, we should test all MySQL flavors and versions Vitess claims to support.

rafael commented 4 years ago

Oh that's a great idea. Just added that!

derekperkins commented 4 years ago

This is great. I think we should also add release tagging as a part of the production readiness cadence.

morgo commented 4 years ago

If everyone is okay with it, I would like to take a look at this after I've finished the release cycle documentation + small documentation refactoring tasks I'm working on (~1-2 weeks time)

rafael commented 4 years ago

@morgo yes! I think that makes sense. I see the release cycle/doc as a pre-req for the improvements in this issue.

morgo commented 4 years ago

I am going to look at Circle CI & AWS Code Build in scope as well. @dkhenry suggested that if we parallelize the tests more we can run the regression suite on every commit (versus nightly).

That makes sense to me. We can revert the plan to nightly if the cost/time is prohibitive.

morgo commented 4 years ago

I am planning to loop back on this. I just want to merge a couple of PRs that change the build/testing environment:

Work is also underway to remove python from the tests. @arindamnayak, @ajeetj and @saurabh408 are all working on it :-)

What I plan to do is first move local_example to use GitHub actions as a true matrix build on supported flavors; since this is python-less. Assuming we can run tests on new infrastructure much faster, we can go with the "everything on commit" plan, which simplifies having to think about things.

aribalam commented 2 years ago

@GuptaManan100 I am willing to work on this project. The way I understand, there are 2 tasks.

  1. All the functional tests that are run in Travis CI need to be migrated to Github actions.
  2. We need to create scenarios that requires users to downgrade a level in case of a failed upgrade. Moreover, we need to create Github Actions that will enable us to checkout 2 versions for testing.

Is there anything that needs to be known to better understand the project?

GuptaManan100 commented 2 years ago

The mentor for this project is going to be @harshit-gangal. He will be best able to answer your queries.

harshit-gangal commented 2 years ago

The Vitess Test Suite runs on Github Actions today, there is no Travis CI anymore. We need to test the real-world scenario of upgrading to a newer version in any order (Vttablet or Vtgate first) and then downgrading while serving traffic.

aribalam commented 2 years ago

@harshit-gangal Great! :) Could you give me any starting lead on how to get started so as to understand the project better?

deepthi commented 2 years ago

This might help: https://github.com/vitessio/vitess/issues/7344 Upgrade/downgrade scenarios that need testing are documented in that issue.

aribalam commented 2 years ago

Okay so I went through #7344, and observed that only 2 of the scenarios still remains to be tested

  1. Upgrade/downgrade subset of vttablets.
  2. Upgrade/downgrade vtgate, vtctld with 1.

Also, it mentions that there needs to be some more explicit testing for vtgate and vtctld in the end-to-end tests itself. Is there anything else that I failed to mention? @deepthi @harshit-gangal