Patterns for managing expensive spec setup

onsi commented 1 year ago

On #130 @nabbas-ca wrote:

I want to write k8s controller/operator tests within 2 suites, one with envtest controlller manager initialized and listening, and one without. So I can have finegrained negative path testing. For example, I want to to be able to setup a mock k8s api with bad CRs(before the webhook is alive or registered), and see how my controller reacts to correct it , as unit tests.

This way, I can setup a set of suites, within the same package like this:

1- clean controller, no CRs preloaded 2- preloaded good CRs, setup controller after the CRs are created 3- preloaded bad CRs, setup controller after the CRs are created .....

Is that possible? remember that envTest is expensive to start. making a BeforeEach at top level of tests that includes manager start, could be possiblle, but it means starting and stopping manager for every test.

onsi commented 1 year ago

hey @nabbas-ca I wanted to pull this out into a separate issue so we can dig into it a bit more. this comes up from time to time - particularly for kubernetes suites with complex setups. If you're up for it I'd love to explore the problem more with a few goals in mind:

i'd like to better understand some of the nuances around kubernetes and CRs when it comes to testing (I have some understanding of k8s but not enough to reason about this alone).
I'd like to explore what patterns are possible with Ginkgo as it exists today to accomplish what you want to accomplish and what the various trade-offs are.
I'd like to explore what it might look like to expand on Ginkgo to better support this usecase.

Would you be up for digging into things at a deeper level with me? If so I'll write up some thoughts/questions on this issue.

(If that sounds really abstract/too-much-work-just-give-me-the-answer-please allow me to clarify: I've bumped into this problem space a few times with k8s e2e users and often times the question is framed in terms of a preferred solution - e.g. "do you have BeforeAll" or "can I have multiple suites in one package" and what i'd like to do is better understand the problem so I can figure out the right thing to suggest and/or build. V2 actually introduces an implementation of BeforeAll that can kind of help with cases like this - but I think it also misses the mark for folks in a few ways. so I'd like to move away from just talking about solutions and more into understanding the problem a bit more.)

nabbas-ca commented 1 year ago

@onsi I'm up for this. I've designed it with BeforeAll at the end. But would like to discuss if it makes more sense in Gingko.

onsi commented 1 year ago

Great!

Sorry for the length, but here goes!

Ginkgo is, currently, tailored towards suites where specs are independent from one another. Independent specs are usually easier to reason about and can, usually, be trivially parallelized which can yield substantial performance gains. For example: Ginkgo's integration suite takes ~4 minutes when I run it serially but only ~45 seconds when I run ginkgo -p on my M1 Max laptop.

Of course, when you're building integration suites performance considerations are definitely in play. If the state under test is very expensive to set up (either in terms of time orcomputed resources) it can be hard to write independent specs. There are three common patterns that are in use today (these aren't the only things out there - you can mix and match and get more elaborate - but they provide a good frame of reference to dig into the topic).

Per-Spec Setup and Teardown

In some integration contexts you can simply set up the entire world under test before each spec (typically in a BeforeEach) and then tear it all down after each spec (typically in an AfterEach or DeferCleanup). I'll often do this if I'm testing a client/server pair. Go is super fast so I can just spin up the server (internally, in-process) for each spec, do stuff with the client, then tear it all down afterwards. I can then trivially parallelize this suite by ensuring that the server ports on different parallel processes don't conflict (which is easy to do).

Dedicated Resources Per Parallel Process

In some integration contexts it can be too expensive to set up and tear down the resources for each spec. But you can afford to give each parallel process its own dedicated resource (setup in a BeforeSuite and cleaned up in an AfterSuite) and then have the specs running on that process use that resource.

This makes parallelization fairly straightforward. You just need to make sure you correctly reset the resource after each spec (typically done in an AfterEach).

I'll sometimes do this if I'm testing code that talks to an external database. I usually have enough compute resources to spin up multiple databases and just point each parallel process as its own dedicated database. Cleanup then is as simple as dropping the entire database between test runs - which can usually be done pretty efficiently.

Shared Resources for the Suite

Finally, in some contexts, it isn't feasible to have dedicated resources and so a single shared resource must be used. This is usually set up in a SynchronizedBeforeSuite and then cleaned up in a SynchronizedAfterSuite. Here, care needs to be taken to ensure that specs don't step on each other when running in parallel. Typically the shared resource has some notion of tenancy so it is possible to give each parallel process a dedicated tenant (e.g. a namespace, or a shard, or an application user).

As long as the specs are exercising things that only affect the single tenancy they inhabit - these sorts of specs are trivially parallelizable. However sometimes there are destructive specs that need to run that could affect other specs running in parallel. For these, Ginkgo provides the Serial decorator which basically instructs Ginkgo to run the spec on a single process at the end after all the parallel specs have run.

If you've gotten this far - thanks for hanging in there!

When it comes to tests on a system like Kubernetes there are multiple layers of resources. And we can reason about each one separately.

There's:

The k8s cluster
Controllers/Operators
Workloads/Custom resources

Which kind of approach works for each of these? And what are the tenancy capabilities available for each of them? I don't have a lot of direct k8s experience but I'm pretty confident that a per-spec setup and teardown of a fresh k8s cluster is out of the question. My sense, also, is that k8s clusters are sufficiently large and expensive that most users don't spin up a dedicated cluster per process. Instead, most opt for option 3: a shared k8s cluster for the entire suite + k8s tenancy constructs (namespaces?) to ensure specs can play without influencing each other.

I don't know enough about controllers/operators and workloads/CRs to know if it is feasible for them to be set-up/torn-down on a per-spec basis or if they need to be done in a dedicated or shared way. I don't know enough about how tenancy works in that situation.

I would assume that k8s is efficient enough that for your case:

1- clean controller, no CRs preloaded 2- preloaded good CRs, setup controller after the CRs are created 3- preloaded bad CRs, setup controller after the CRs are created

that you could simply write specs for 1 that setup and tear down a clean controller with no CRs preloaded in a BeforeEach/AfterEach within a dedicated namespace. You would then be able to parallelize these and the cost of starting/stopping the controller would be low and mitigated by the fact that there is enough compute to run all the specs in parallel.

Same with 2 and 3 - could you preload good/bad CRs into separate namespaces then set up the controllers after they are created - all in a BeforeEach and then run the spec? What is the cost of spinning up and tearing down the controller for each spec? Even if it is a little high, do you still get a win by being able to run specs with greater parallelism?

Often times folks will reach for BeforeAll and Ordered specs to solve this. But Ordered specs are explicitly not parallelized internally. (To be specific: one set of Ordered specs can run in parallel with another set of Ordered specs - however the specs within the Ordered container run serially with respect to each other). This might be fine and you might end up implementing 1, 2, and 3 as distinct ordered containers. However if you do, you'll at most be able to parallelize up to 3 workers. That might be fast enough. But you might actually get a faster suite if you set up and tear down the controllers for each spec and parallelize more aggressively (8 workers?). I don't know! It depends on the performance characteristics of k8s and the controllers.

So, some questions:

What are the performance characteristics of CRs/operators/controllers?
What's your sense of the cost/benefit of always setting-up/tearing-down and increasing the number of parallel workers vs using Ordered containers and reducing the amount of potential parallelizability?
Does this conversation give you any new ideas for how to approach this problem given k8s tenancy support?
Do you even care about parallelizability? Would you rather just be able to have specs in a Describe/Context container share resources and run in series and not have to worry about parallization?

Airblader commented 1 year ago

I wanted to chime in that I have exactly the same issue, in the same context of Kubernetes operators. For me, the situation is basically this:

I want to test my operator using envtest, so I set up a BeforeSuite that spins that up.
I also want to have some basic unit tests for some functions (think smaller utilities) that do not depend on any clusters.

Currently, there seems to be no good way of achieving this other than moving things into different packages, but there's a certain taste to being forced to organize code differently because you can't separate the test setup well enough.

One thought that crossed my mind was that it'd be kinda neat if the suite could select only tests that carry some label. We want to label tests using envtest with some "integration test" label anyway, so if we could have two suites with one selecting on the label and one selecting on not having the label, that problem would be solved (just spitballing).

onsi commented 1 year ago

hey @Airblader

First a quick comment:

other than moving things into different packages, but there's a certain taste to being forced to organize code differently because you can't separate the test setup well enough.

I do (personally, subjectively) tend to prefer separating unit tests from integration tests (admittedly the terms can be squishy - I've written some very integrationy unit tests and vice versa). Two reasons for this preference:

The setup for unit tests is usually very different than for integration tests
Unit tests are "thing I run a lot while I'm developing a new thing" and then integration tests are "things I add to/run to ensure all is well when i'm close to done" with the former being very fast (O(second)) and the latter being less fast (O(minute)).

Given that I've (again, personally, subjectively) preferred to keep them separate and in Go that means separate packages. That opinion certainly informs Ginkgo's design and default usage. But Ginkgo isn't particularly opinionated about this at all, and has evolved over the years to get pretty flexible. So...

One thought that crossed my mind was that it'd be kinda neat if the suite could select only tests that carry some label. We want to label tests using envtest with some "integration test" label anyway, so if we could have two suites with one selecting on the label and one selecting on not having the label, that problem would be solved (just spitballing).

You can do precisely this. And some relatively new stuff that landed in 2.8 makes it even more straightforward.

You can use Labels to annotate your specs with metadata and then run ginkgo --label-filter=X to pick out just a subset of tests. You can also inspect the label filter at runtime to determine whether or not to run certian setup or not. Here's an example that maps onto your usecase (all squished into one file, but obviously that's not a requirement). I've tried to flesh this out a fair bit so that it's realistic enough to be useful. In particular, this would provide a parallelizable suite in which each Ginkgo process shares the cluster and uses k8s namespaces for isolation from the other processes running in parallel:

// same entry point as always
func TestOperator(t *testing.T) {
    RegisterFailHandler(Fail)
    RunSpecs(t, "Operator Suite")
}

// pseudocode - but the idea is `client` is how you interact with the cluster
var client *k8s.Client

// we use SynchronizedBeforeSuite to only spin up the e2e cluster on one Ginkgo process
// we then pass the connection details to all processes so they can connect to the cluster
var _ = SynchronizedBeforeSuite(func() []byte {
    var connectionDetailsJSON []byte = nil
    // this tests to see if the current configured `-label-filter` will select tests labeled integration
    // if so we spin up the cluster
    if Label("integration").MatchesLabelFilter(GinkgoLabelFilter()) {
        cluster := envtest.SpinUpCluster() // pseudo-code
        DeferCleanup(cluster.Teardown)     // this will clean up once all the tests complete
        connectionDetailsJSON, _ = json.Marshal(cluster.ConnectionDetails)
    }
    return connectionDetailsJSON
}, func(connectionDetailsJSON []byte) {
    //connectionDetailsJSON will be non-nil only if the integration tests are enabled 
    if connectionDetailsJSON != nil {
        var connectionDetails k8s.ConnectionDetails
        json.Unmarshal(connectionDetailsJSON, &connectionDetails)
        //client will be non-nil
        client = k8s.NewClient(connectionDetails)
    }
})

var namespacedClient *k8s.NamespacedClient //pseudocode, obviously
// each integration test needs its own namespace, we accomplish that with a BeforeEach...
var _ = BeforeEach(func() {
    // ...but we only want to set up the namespace if this is an integration test.
    // so we check and if the current test has the integration label. 
    matches, _ := CurrentSpecReport().MatchesLabelFilter("integration"); 
    if matches && client != nil {
        namespacedClient = client.CreateRandomNamespace()
        DeferCleanup(namespacedClient.Teardown)
    }
}, OncePerOrdered) // we want the namespace to exist throughout the duration of an Ordered container if we use any

var _ = Describe("When a resource is requested", func() {
    Context("these are integration tests", Label("integration"), func() {
        It("makes a resource", func() {
            client.MakeResource(...)
            Eventually(http.Get).WithArguments("http://localhost:8080/resource").Should(HaveField("StatusCode", http.StatusOK))
        })

        It("lists existing resources", func() {
            client.MakeResource(...)            
            Eventually(client.ListResources).Should(HaveLen(1))
        })
    })

    Context("these are unit tests", func() {
        It("handles the case where the resource request is malformed", func() {
            Expect(operator.New(...)).To(MatchError(operator.SYNTAX_ERROR))
        })

        It("uses sane defaults when not specified", func() {
            o := operater.New(...)          
            Expect(o.WidgetCount).To(BeNumerically("<", 50))
        })
    })

    It("this is a standalone integration tests", Label("integration") func() {
        //...
    })
})

Now you can run all the tests (in parallel) with:

ginkgo -p

You can run just the unit tests with:

ginkgo -p -label-filter="\!integration"

and all the integration tests with:

ginkgo -p -label-filter="integration"

(a simple makefile or some other scripting solution will help you not have to retype all those things every time). The cluster and namespace will only be made when the integration tests will run.

nabbas-ca commented 1 year ago

@onsi I like where this is going. However, I just want to have multiple suites running with different setups:

Suite1 has envtest with openshift for example
Suite2 is vanilla k8s envtest
Suite3 is envtest with operator webhook setup only
Suite4 is envtest with operator , with both controller and webhook setup

All of these residing in the same package as BeforeSuite (or SynchronizeBeforeSuite)

and each spec , at any level of the node , to specify which suite to use as setup. default to Suite1 for example.

Can that be possible somehow?

nabbas-ca commented 1 year ago

I've been trying to create a public repo to demonstrate this, not yet done.

onsi commented 1 year ago

hey @nabbas-ca - can i ask you to step back and frame this less in terms of a potential solution and more in terms of the problem you are trying to solve. (eg "i'd like to have the same tests point at different kinds of k8s clusters so i can validate that things work regardless of environment")

that might help me brainstorm potential solutions for you.

onsi commented 1 year ago

(i'm just concerned that i'm misunderstanding you and want to check the actual underlying problem statement)

nabbas-ca commented 1 year ago

Yeah, I like to write unit tests for the operator (trying to shift the testing as left as possible, since we can mock a lot in golang and ginkgo)

Here is how I would like to structure my unit tests in a sample operator (initialized and scaffolded by operator-sdk):

1- in package api/v1alpha1, webhooks are defined and created(operator-sdk scaffolding). Test Strategy:

a- Any method that doesn't require k8s client/config, then use native golang unit tests to run those tests
b- Any method that does require k8s client, then use ginkgo to test. However, here is a dilemma/problem i'm facing:
           i- BeforeSuite runs before the whole package is unit tested. and initialized envTest and the webhook is registered. I would like to run some unit tests where:
              - some "old CR" is mocked, before webhook is fixed in latest code. Means , that it needs to be created after envtest, before webhook registration
              - register the webhook in envTest
              - update the old CR, so that update webhook is called.
            ii- I would like to solve this by having multiple Suites in the same api/v1alpha1 package, with different setups:
                 - Suite1 , with setup of k8s envtest with webhook registered
                 - Suite2 , with setup of k8s envtest , with some preloaded CRs that didn't go through webhook , then register webhook (some negative path unit tests)

2- in package controllers, controllers are defined. Test Strategy: a- Any method that doesn't require k8s client/config, then use native golang unit tests to run those tests b- Any method that does require k8s client, then use ginkgo to test. However, I have another dilemma similar to above. i- BeforeSuite runs before the whole package is unit tested. and initialized envTest and the controller is registered. I would like to run some unit tests where:

some "old CR" is mocked, before controller is fixed in latest code. Means , that it needs to be created after envtest, before controller registration
register the controller in envTest

update the old CR, so that controller is called.

  ii- I would like to solve this by having multiple Suites in the same api/v1alpha1 package, with different setups:
             - Suite1 , with setup of k8s envtest with webhook and controller registered
             - Suite2 , with setup of k8s envtest , with some preloaded CRs that didn't go through webhook/controller , then register webhook and controller(some negative path unit tests)
             - Suite3, with setup of k8s envtest with only controller registered , to unit test controller without webhook interfering for some negative paths of the tests

All of this, while structuring the unit test files to be adjacent to the code being unit tested. I like this because the code that runs the unit tests is adjacent, and can serve as code sample for usage of these methods at the same time.

nabbas-ca commented 1 year ago

I apologize for the misformatting, i don't know how it got that way.

onsi commented 1 year ago

OK - that's a fair bit. I'll try to help but first I need to better understand k8's envtest and its isolation semantics. I assume envtest spins up a k8s cluster. Setting the testing stdlib unit tests aside, and focusing just on the controller tests for now... it sounds like you have integration tests that need to point at a live k8s cluster and that there are three categories of tests:

some need a k8s cluster with the webhook and controller registered
some need a k8s cluster with some preloaded CRs. you would then register the webhook and controller and see them do some magic
some need a k8s cluster with only the controller registered and no webhook

What I'm left wondering is: can we do all this, in parallel, with one cluster. Or do we need to have multiple clusters, one for each class of test. This is where my lack of knowledge of k8s isolation semantics comes into play.

Let me be super concrete and specific:

Can you register a controller/webhook with a particular k8s namespace? Can different namespaces with different configurations of controller/webhook exist on the same cluster? And will k8s ensure they are isolated?

Moreover, is the registering of a controller/webhook reasonably efficient?

If the answer to all this is "basically, yes" then I would structure the Ginkgo integration tests such that:

a SynchronizedBeforeSuite spins up a single shared envtest cluster.
each spec gets a fresh namespace in a BeforeEach.
the spec is then responsible for instantiating the controller/webhook in that namespace in the way that it needs.
the namespace is then torn down in an AfterEach or DeferCleanup.

If the answer to these questions is "the isolation works, but setting up controllers is expensive" then I would do something like:

a SynchronizedBeforeSuite spins up a single shared envTest cluster
the SynchronizedBeforeSuite then sets up a namespace that has the controller+webhook registered - specs that need that kind of cluster use that namespace.
it also sets up a namespace with only the controller registered and no webhook - specs that need that kind of cluster use that namespace
probably your other scenario (preloaded, then register) won't work to be set up globally and those specs will need to run a bit slowly.

If the answer to these questions is "the isolation will not work" then I suspect the simplest solution will be to have multiple Ginkgo suites, each one spinning up its own envtest as needed. Alternatively a single Ginkgo suite that spins up three kinds of envtest environments.

We can dig into the rest of it further - but lets pause there and stress test some of these answers. What are your thoughts?

onsi commented 1 year ago

also - if it would help - i'd be happy to find time for a zoom call since there's a lot of complexity here!

nabbas-ca commented 1 year ago

I could definitely setup a zoom call for this (or webex if it is ok). I'm in Toronto, which is 2 hours ahead of you probably. How does your calendar look?

I can answer some of the k8s questions here as well:

envtest spins up a mock k8s cluster for unit testing. you can spin many of them in parallel (if you have multiple packages for example) and they use different ports on the localhost. They act as if the other doesn't exist.
Can you register a controller/webhook with a particular k8s namespace? That actually depends on the operator itself. Some operators are namespace scoped, some are cluster scoped. We would like to devise a testing strategy that works on both types of operators.
What I'm left wondering is: can we do all this, in parallel, with one cluster? Or do we need to have multiple clusters, one for each class of test? multiple envtests is equivalent to multiple clusters and you can have them in parallel without major issues. There might be port conflicts, but that is solvable by picking free ports when starting. envtests are not super expensive to spin up (~1m) . however, spinning one up per test makes a unit test suite run in hours instead of minutes.

I like the idea of multiple envtests in the setup. I think that would solve all my problems. I could have a wrapper around each envTest as a test Suite.

onsi commented 1 year ago

sweet - this is starting to make more sense to me.

based on what you’re saying it does sound like you can get pretty far with one (or perhaps a couple of well-defined) shared cluster(s) and then have a few tests that need their own cluster on demand.

I like the idea of multiple envtests in the setup. I think that would solve all my problems.

yes, give it a try and see how far you can push it!

I could have a wrapper around each envTest as a test Suite.

I’m not following what this means, though. I’m wondering if what I mean by “test suite” (which is a very specific notion in Ginkgo) and what you mean might be different.

Would be happy to do zoom/webex. I’m in Denver so, yeah - 2 hours behind you. I’m free much of tomorrow between 9 and 2 Denver time (so 11 and 4 your time). i’m on both the kubernetes and gophers slack instances so you can dm me there (handle is @onsi on both) and we can figure out a time and exchange calendar invites.

onsi commented 1 year ago

hey @Airblader - in case you ended up building off of that example. I had forgotten that you can do:

CurrentSpecReport().MatchesLabelFilter("integration")

I've updated the example with that.

@nabbas-ca let me know if you'd like to connect or if you've been able to build of some of the idea we discussed.

onsi commented 1 year ago

In case there's interest, please take a look at the new proposal for managing shared resources and having finer-grained control over parallelism here: https://github.com/onsi/ginkgo/issues/1292

onsi / ginkgo