Integration Testing - Githubissues

distributedstatemachine commented 6 months ago

Description

To ensure continuous reliability between our Subtensor and the Bittensor package, we need to implement a comprehensive GitHub Actions workflow. This workflow will automate the entire testing process, from building the blockchain node using the localnet.sh script, to installing the Bittensor package from a configurable branch, and finally running the test_subtensor_integration.py integration test.

The primary objective of this setup is to verify that any changes introduced to the subtensor codebase do not break or introduce regressions in the Bittensor Python code. By parameterizing the Bittensor repository branch, we can test against various development stages and release candidates, ensuring compatibility and robustness across different versions.

Acceptance Criteria

The GitHub Actions workflow should accept the Bittensor repository branch as a configurable input parameter.
The workflow should trigger automatically on push or pull request events to specified branches, with an option for manual triggers.
It should build the latest version of our blockchain node using the localnet.sh script.
The Bittensor package should be fetched and installed from the specified branch of the GitHub repository.
The test_subtensor_integration.py integration test should be executed after successful installation of the Bittensor package.
Test results, including any failures or errors, should be prominently reported within the GitHub Actions workflow.
The workflow should gracefully handle and report any errors encountered during the build, installation, or testing process.

Tasks

[ ] Design and implement a GitHub Actions workflow file in the .github/workflows directory.
- [ ] Define a workflow trigger based on push or pull request events to specified branches.
- [ ] Include an option for manual workflow triggers.
- [ ] Add an input parameter for specifying the Bittensor repository branch.
[ ] Integrate the localnet.sh script into the workflow for building and starting the blockchain nodes.
- [ ] Ensure the script is executed with the appropriate parameters and environment variables.
- [ ] Handle any errors or failures during the node build and startup process.
[ ] Implement steps to fetch and install the Bittensor package from the specified branch.
- [ ] Use the input parameter to dynamically set the branch for Bittensor package installation.
- [ ] Handle any dependencies or setup required for the Bittensor package.
[ ] Configure the environment and execute the test_subtensor_integration.py integration test.
- [ ] Set up any necessary environment variables or configurations for the test.
- [ ] Run the integration test and capture the test results.
[ ] Implement comprehensive error handling and reporting throughout the workflow.
- [ ] Catch and handle any errors or exceptions that may occur during each step.
- [ ] Provide clear and informative error messages in the workflow logs.
[ ] Optimize the workflow for performance and efficiency.
- [ ] Implement caching mechanisms for dependencies and build artifacts.
- [ ] Parallelize independent tasks wherever possible.
[ ] Optional: Implement notifications or integrate with monitoring tools for test failures or critical issues.

Additional Considerations

Utilize Docker containers within the GitHub Actions workflow to provide a consistent and isolated environment for building, installing, and testing.
Implement a matrix strategy to test against multiple versions or configurations of the Bittensor package and subtensor node.
Consider integrating code coverage reporting to monitor and maintain high test coverage.
Explore opportunities for performance optimization, such as parallel test execution or selective test runs based on changed files.

1. What to test?

After speaking with @distributedstatemachine, it has become apparent that test_subtensor_integration.py is only designed for working with a mocked version of substrate making it unsuitable for the integration tests described in this issue.

Therefore, an entirely new test suite will need to be written for these e2e tests.

For basic tests, I could work through each file one-by-one in bittensor/commands and write e2e tests for all the combinations of logic in each subcommand. Is this something we want to do? Are there any commands higher priority than others? It feels like it will take quite a long time to write tests for every subcommand, maybe we only want to write for some?

For multi-step tests, there was already a scenario described here: https://discord.com/channels/799672011265015819/1176889736636407808/1236057424134144152 . Are there any other complex e2e cases we should test?

2. How do I call into the CLI?

I suggest directly call run on the commands exported from bittensor/commands. That way, it's easier to mock the command args compared to directly calling the cli binary.

3. How to clear state between each e2e test?

Since from inside the bittensor script we have no way to restart the chain (necessary between e2e tests to prevent polluting state and potential weird race conditions) we will need a test harness which runs a new ./localhost.sh instance for every test.

I'm thinking to create an orchestrator file, which will spin up a localhost.sh instance, run a test, close the instance, repeat for each e2e test.

There may also be some voodoo possible with beforeeach and aftereach pytest hooks, but I'm not sure if it would be worth the extra effort to get those working.

4. How to structure the test files?

I'm thinking of creating a new dir tests/e2e_tests for these as they are more e2e than integration, and there's already a dir tests/integration_tests which is used for mocked substrate testing.

My proposed structure of the new testing dir is

.
└── tests/
    └── e2e_tests/
        ├── subcommands/
        │   ├── subnets/
        │   │   ├── list.py
        │   │   └── ... # other subnets commands here
        │   └── ... # other subcommands here
        ├── multistep/
        │   ├── tx_rate_limit_exceeded.py
        │   └── ... # other multi-step e2e tests here
        ├── common.py # common utils 
        └── run.py # test orchestrator which will spin up a new `localhost.sh`, run e2e test, repeat for each e2e test defined

distributedstatemachine commented 5 months ago

How do I call into the CLI? I suggest directly call run on the commands exported from bittensor/commands. That way, it's easier to mock the command args compared to directly calling the cli binary.

I think we can use the call the cli the same way its currently done :

https://github.com/opentensor/bittensor/blob/master/tests/integration_tests/test_cli.py#L445-L5

How to clear state between each e2e test? Since from inside the bittensor script we have no way to restart the chain (necessary between e2e tests to prevent polluting state and potential weird race conditions) we will need a test harness which runs a new ./localhost.sh instance for every test. I'm thinking to create an orchestrator file, which will spin up a localhost.sh instance, run a test, close the instance, repeat for each e2e test. There may also be some voodoo possible with beforeeach and aftereach pytest hooks, but I'm not sure if it would be worth the extra effort to get those working.

I like this , i think we can call purge chain on the binary . Alternatively , we can write a long test that covers all the happy paths , and not have to purge the chain at all

orriin commented 5 months ago

@sam0x17 DM'd me about 1., saying that it's more important for this PR to get a nice process / structure / examples in place to write e2e tests than to get full test coverage.

So I'll start with 2 examples

a very basic one (single CLI call) and
a more complex one involving multiple cli calls (described here https://discord.com/channels/799672011265015819/1176889736636407808/1236057424134144152 )

sam0x17 commented 5 months ago

also some key requirements I would like to meet with this if possible:

[ ] each test runs with a completely clean state, meaning there is no run-order dependencies between tests, and this state is thrown away at the end of the test regardless of whether it fails or succeeds. So for example even if the test somehow wipes out all balances or something, the scope of that is completely limited to that test. AKA no side effects
[ ] (if possible) tests can run in parallel, meaning we have to do some kind of process forking most likely. the mental model is we want to simulate starting and then reverting a db transaction. This will allow us to eventually comfortably have thousands of integration tests without the CI taking 30+ minutes to run

orriin commented 5 months ago

I have a PoC using pytest fixures (https://docs.pytest.org/en/6.2.x/fixture.html) to spin up and spin down localnet nodes between tests.

The initial adaptation will need to run in serial, but with some additional logic to find free ports it should be possible to upgrade in the future with the ability to run in parallel.

sam0x17 commented 5 months ago

I have a PoC using pytest fixures (https://docs.pytest.org/en/6.2.x/fixture.html) to spin up and spin down localnet nodes between tests.

The initial adaptation will need to run in serial, but with some additional logic to find free ports it should be possible to upgrade in the future with the ability to run in parallel.

awesome, this is a great first stab at this 💯

can you link the PR to this with a fixes #331?

opentensor / subtensor

Integration Testing #331