CJ-Wright commented 6 years ago

We may need to run true integration (with a dev graph to boot) so we don't blow up the graph by accident and can run CIs with close to 100% coverage.

CJ-Wright commented 6 years ago

It would be good to separate the actual operation (the pulling of data and updating of files) from the pushing back to repos.

viniciusdc commented 4 years ago

@CJ-Wright Indeed, we should really rethink this ! Is there any new development ? I can think about this if possible (I can add to my GSoC program as a form of debug/test codes)

CJ-Wright commented 4 years ago

There weren't any recent developments on this front to the best of my knowledge

ytausch commented 2 months ago

I want to revive this issue and came up with the following concept:

GitHub Accounts and Repositories

For a proper integration test strategy, we must mimic the relevant GitHub accounts and repositories with which the bot interacts. I propose the following accounts ("test accounts") and repositories ("test repositories"):

conda-forge-bot-staging (organization) mimics the conda-forge organization and will contain a selection of test feedstocks (see below how we create them)
regro-cf-autotick-bot-staging (user) mimics the regro-cf-autotick-bot account and is a test environment in which the bot will create forks of the conda-forge-bot-staging repositories
regro-staging (organization) (named after the regro account) contains a special version of the cf-graph-countyfair which the bot uses during testing. See below how we prepare the graph for testing.

I am aware this requires us to manage three additional GitHub entities. However, since production also uses three accounts this way, we should stick to this architecture and mirror it as closely as possible, preventing future headaches.

Integration Test Definition

To define test cases, we use the following directory structure:

definitions/
├── pydantic/
│   ├── resources/
│   │   ├── recipe_before.yml
│   │   └── ... (entirely custom)
│   ├── version_update.py
│   ├── aarch_migration.py
│   ├── some_other_test_case.py
│   └── ...
├── llvmdev/
│   ├── resources/
│   └── test_case.py
└── ...

As shown, there are different test cases for different feedstocks. Each test case is represented by a Python module (file). Each test case module must define a prepare() and a check_after() method.

The prepare method uses our yet-to-be-implemented integration test library (see below) to set up the test case by defining how the feedstock repo, the forked repo of the bot account (does it even exist?), the cf-graph data relevant to the feedstock, and possibly needed HTTP mocks (see below) look. Setting up the repositories includes preparing PRs that might be open.

The check_after method is called after the bot is run and can throw several assertions against the test state (e.g., files present in the forked repository, a specific git history, cf-graph data). Helper functions provided by our integration test helper library make writing those assertions easy.

Integration Test Workflow

We run the integration tests via a single GitHub workflow in this repo. It consists of the following steps:

Make sure the relevant test repositories (in conda-forge-bot-staging, regro-cf-autotick-bot-staging, and regro-staging) exist and have the correct configuration. The required test repositories are read from the test definitions (see above).
Run all test scenarios. A test scenario randomly selects a test case for each feedstock appearing in our test definitions (see below). Of course, we ensure that all test scenarios run all test cases for each test feedstock together. In each test scenario:
1. We force-push all relevant test data to the test repositories. Since each test case defines its portion of cf-graph data separately, the combined test graph is generated by merging with jq. If needed, old branches generated by previous test runs are deleted. As pointed out above, the test data is generated by each test case's prepare method. HTTP mocks are also set up.
2. Run all steps of the autotick-bot in sequence. We intentionally do not test individual jobs separately. GitHub reusable workflows are used to use as much production config as possible in the tests.
3. Run each test case's check_after method to validate the state after the bot run.

To emphasize, we test multiple feedstocks together in each test scenario. This speeds up test execution (because the bot works in a batch job fashion) and might uncover some bugs that only occur when multiple feedstocks or their cf-graph metadata interact with each other in some way.

HTTP Mocks

The version update tests, especially, will require us to mock HTTP responses to return the latest versions we want them to return. ~To accomplish this, we use VCR.py cassettes that have been modified accordingly. If possible, we might use the pytest-recording pytest plugin on top of that.~ We cannot use VCR.py because we want to reuse the bot's workflows, and VCR.py is not a true web proxy, only instrumentation around Python mocks. So I propose something like MockServer.

Pytest Integration

The test scenarios are generated by dynamically parametrizing a pytest test case. This pytest test case runs once per test scenario, dynamically importing the correct Python modules (test cases) for each feedstock that is part of the test scenario and then executing them.

Integration Test Helper Library

The integration test helper library provides helper functions for the prepare() and check_after() functions. For example, we might give a function setup_from_resources that copies a pre-defined feedstock from a resources folder (see "Integration Test Definition" above) into the test feedstock repository.

For check_after, we could provide helper functions for checking that a GitHub PR has been opened on the test feedstock repository with the correct title or another function for checking that the contents of the bot's fork has the expected content.

The integration test library must offer an option to run conda-smithy rerenders. The results of these operations can be cached using GitHub Actions Caches, respecting the conda-forge.yml, recipe/ contents, and conda-smithy version.

Practice will show which exact helper functions are necessary.

Let me know what you think!

@beckermr

Cc @0xbe7a

beckermr commented 2 months ago

I need to read more, but the idea of a custom integration test library sounds painful. It is entirely possible to do this within pytest and we should do so.