ethereal testing interface

zomglings commented 6 years ago

Unit tests and user tests

This is an attempt to specify the testing framework in ethereal. There is one guiding principle: The mechanism used to unit test ethereal commits will be the same as the one users use to test their code on ethereal networks.

Beyond simple health checks and the like, ethereal unit tests will require execution of code on the nodes comprising an ethereal network. This applies equally well to the use cases of testing network topology definitions, testing dapps and smart contracts, and testing alternative node implementations, which we think will attract users to ethereal in the first place. It doesn't make sense to maintain the unit testing framework independently of the testing interface provided to ethereal users.

Additionally, the usability of the user testing interface will be a primary concern with ethereal and a lot of work will be done in that direction. Given that we want ethereal to be as incontrovertible as possible a source of validation for user code, our unit testing framework calls for an equivalent intensity in focus. Coupling the two knocks down two mangoes with one stone.

Henceforth, a "test" could refer either to an ethereal unit test or to a user-written test.

Framework specification

Each test is defined by:

A network context in which the test is intended to run -- this context will be represented by a YAML file defining the network topology (which can be run using docker-compose)
Test code, provided in an image such that any container run from that image that exits with code 0 if the test(s) succeeded and with code 1 if the test(s) failed.

When a test is run, the framework must:

Spin up a test network with the topology specified in the test's network context
Build the image with the test code (if a Dockerfile was provided).
(Once the network is ready) Run a container built from the test code image.
(Once the test code container has exited) Tear down the test network.

As docker-compose doesn't (and shouldn't) solve the problem introduced by the "once x has happened" conditions, we will have to provide the appropriate signalling semantics for test containers. This involves:

Waiting for signals from the nodes in the network before running the test container
Bringing down the network and cleaning up after the test container has exited

Finally, there is an important usability requirement - it should be possible for any user to run this code from any environment that satisfies the requirements of ethereal itself. So far, these requirements are minimal in that they simply assert access to docker and docker-compose.

Implementation considerations

The biggest decision to make here is whether or not the entire testing framework itself should run within a container. Since the framework is intended to heavily interact with docker and docker-compose, this would require having access to these tools from within a docker container. This seems to be a particularly bad idea: http://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/

Docker Engine does expose a RESTful API and there are client libraries available for both Go and Python: https://docs.docker.com/develop/sdk/

(Note: docker-compose is currently implemented on using this API.)

However, there is no equivalent for docker-compose. The implementation of the tear down behaviour would therefore have us duplicating docker-compose down in any code we wrote against the Docker Engine API if we chose to take that route.

At this early stage of development, it is preferable to maintain some simple scripts, targeting our most common environments, which implement the desired behaviour directly using the docker and docker-compose command line tools. To start out, we would maintain only a bash script but we could expand our repertoire if we see adoption in environments in which bash is not available or convenient.

Proposal 1: The test runner will start out as a single bash script.

As the docker and docker-compose command line tools will be available to our test runner, we should for the sake of usability consolidate as much of the test specification into a docker-compose YAML manifest as possible. This maintains consistency with how a user would bring up an ethereal network to experiment with manually.

Proposal 2: Tests will be specified by docker-compose manifests, just as network topologies are. The test container will be specified as the test entry under the services section of the manifest, alongside the node specifications.

An example test manifest:

version: '2.1'
services:
    bootnode:
        <bootnode service definition>
    miningnode-1:
        <miningnode-1 definition>
    miningnode-2:
        <miningnode-2 definition>
    fullnode-1:
        <fullnode-1 definition>
    test:
        build:
            context: ${CONTEXT_DIR:-./}
            dockerfile: ${TEST_DOCKERFILE:-Dockerfile}
        depends_on:
            - bootnode
            - miningnode-1
            - miningnode-2
            - fullnode-1
.
.
.

As ethereal is currently targeting network simulations on a single machine as opposed to a multi-node cluster, and as a test network could be pretty sizable, the responsibility of parallelizing test runs should be offloaded as much as possible to the framework the user specifies in their testing container. For now, we impose the restriction that users can run tests on network at a time unless they invoke a native parallelization primitive (e.g. xargs -P in bash) on the test runner itself.

Proposal 3: For now, we will implement no test parallelization as part of our framework. This behaviour can be replicated, if necessary, by running many test runner processes concurrently.

All this amounts to:

Proposed implementation

Test runner will be a bash script which:

Accepts a test manifest as specified above
Runs docker-compose up with the appropriate environment variables
Polls a bind mounted testing directory for a signal that the test has completed
Runs docker-compose down with the appropriate arguments

The test container will be responsible for:

Polling the network's shared volume for signals from each of the nodes that they are ready for the tests to run
Signalling in the testing volume that the test has completed

Finally, we may have to implement the appropriate signalling behaviour into our node image.

zomglings commented 6 years ago

Proposal for https://github.com/the-chaingang/ethereal/issues/6

zomglings commented 6 years ago

Note

docker-compose up can be run with the --abort-on-container-exit flag, which stops all containers if any container stops.

This suits our purposes particularly well, and may preclude us having to write our own scripts at all.

In fact, the --exit-code-from argument imples --abort-on-container-exit and would also allow the docker-compose process to exit with the same exit code as the test container.

zomglings commented 6 years ago

Note

docker-compose by default uses the current directory name as a prefix on its containers, volumes, etc. It does allow customization of this prefix through:

-p, or --project-name command line argument to docker-compose
COMPOSE_PROJECT_NAME environment variable

(Unclear which one has priority - will test later)

The COMPOSE_PROJECT_NAME variable can be defined in the .env file in the same directory as the manifest, and docker-compose will use it automatically - reference. Will this work on Windows?

zomglings commented 6 years ago

Accepted: https://github.com/the-chaingang/ethereal/pull/23

the-chaingang / ethereal