informalsystems / modelator

Model-based testing tool

https://mbt.informal.systems

Apache License 2.0

53 stars 5 forks source link

MBT-core: architecture #2

Closed andrey-kuprianov closed 1 year ago

andrey-kuprianov commented 3 years ago

MBT-core implements the core, target language independent, functionality for model-based testing.

Figure out the architecture for the MBT core:

decomposition into components
test configuration: format (#6), instantiation (#7)
interaction with Apalache, Jsonart, language plugins
evaluate boundaries of automatic code generation
- auto-generate TLA+ monitor (#3) , history (#4), model instance (#5)
CI

andrey-kuprianov commented 3 years ago

User interaction

There should be a binary (modelator), that the user can invoke in order to create / check / process / run the tests. This is the main target-language-independent point of entry into the toolchain. The tool should provide access to these functions:

read test configuration files
read TLA+ models mentioned in config files
check config consistency wrt. the TLA+ models mentioned there
produce test instances, based on configuration parameters
run Apalache on test instances
process Apalache output, and present it to the user as well as to other tools
(optional) allow the user to interactively create test configuration in question/answer mode.

For many purposes modelator can be considered as the Apalache wrapper, offering users MBT-specific ways to interact with the tool.

andrey-kuprianov commented 3 years ago

Use case 1: Generate example execution

Focuses on the simplest complete user interaction involving test configuration, instantiation, checking, and running Apalache

Scenario: The user has a TLA+ model, and a test assertion for which they want to produce an example execution. They provide the tool with the config file containing the test assertion, and possibly instantiation parameters, and ask to execute the test and give example (abstract) execution.

Happy path:

configuration correctly refers to the model, the test assertion, contains all necessary instantiation parameters.
Apalache runs successfully, and returns a counterexample
counterexample is converted to the execution, and returned to the user

Possible failures (to become unit tests):

TLA+ model not found
model can't be instantiated because some instantiation parameter is missing
TLA+ test assertion is not provided, or can't be found via file reference
Apalache returns an error when parsing the instantiated model + test
Apalache returns an unexpected outcome (not a counterexample)
desired example execution can't be extracted from the counterexample

andrey-kuprianov commented 3 years ago

Use case 2: conformance test suite generation

Focuses on the Cosmos ecosystem user; besides UC1 involves the weights system and running/generating multiple tests

A conformance test suite includes a (large) set of tests (~hundreds), which are generated from:

the TLA+ specification
the test configuration, which includes/references a set of test assertions, and lists instantiation parameters
the weights for the above, as well as the space description for instantiation parameters
the fixed versions of all software artifacts used to generate the suite: Apalache, Modelator, Jsonart, TLA+ model, test configuration.

Scenario The Cosmos developer wants to test their Tendermint/IBC/Cosmos component, for which there exists a TLA+ specification. They run the MBT-core toolchain with the given TLA+ spec and test configuration, and asks to store generated test executions into the specified location.

Happy path: as for UC1, and additionally

the weights are specified for tests and instantiation parameters
Test executions are generated successfully, and stored into the given location

Possible failures (to become unit tests): as for UC1, and additionally

some weights are unspecified
parameter space is not limited
versions of artifacts are provided, but do not match the ones that can be found
test generation takes too much time, and the user interrupts the process
some tests have been already generated on previous invocations (should be reused)

andrey-kuprianov commented 3 years ago

Test vectors, modules, artifacts, and metadata

This is an attempt to describe the test processing in MBT how I see it now.

Design goals

Expressivity (ability to describe): we need to be able to easily define various test scenarios, including test selection, execution, result inspection, fuzzing, etc.
Flexibility (ability to change): it should be easy to reconfigure the test pipelines, e.g. to change the version of the tested protocols, turn fuzzing on/off, replace one way of test enumeration with another, etc.
Reproducibility (ability to reference and determinism): it should be easy to refer to a particular test or set of tests, and to reproduce them, no matter how much time has passed.

To satisfy the goals I propose to employ the following notions:

test artifacts are any artifacts that are written for or obtained during execution of test modules
test vectors are compact descriptions of test artifacts, enough to reproduce them
test metadata is some information about the test artifact, such as the test type, or the test result, but not the artifact itself
test modules process test artifacts, and produce one or more other artifacts as output.

Test artifacts

Let's consider some examples of test artifacts:

a test configuration, such as this one is an initial test artifact, i.e. such that doesn't have any preceding artifacts
TLA+ input to Apalache, which include the TLA+ model instance, the invariant to check, as well as the model checking parameters, is an intermediate test artifact, that is produced from the test configuration
an Apalache counterexample such as this oneis an intermediate test artifact
a translation of the above counterexample into the test case understood by the LightClient Rust implementation is an intermediate test artifact
the output of the Rust implementation on the above translated counterexample is the final test artifact.

Test vectors

A test vector is a compact description of how a particular test artifact was produced, i.e. of the input data and a sequence of modules that processed it. It is not the artifact itself, it is only enough information to precisely reproduce it. Test vectors are extensible. If the type of some test artifact is compatible with the input type of some test module, this artifact can be submitted to the module; the test vector of any of the resulting artifacts will be an extension of the current test vector.

Test metadata

Test metadata is a key-value map associated with a test artifact. In particular, each test artifact should include its type in the metadata. Any additional information is up to the module that produces the artifact; it should be now compact enough to be stored if needed, but useful either for automatic artifact processing, or for the user inspection.

Test modules

These are the processing units that accept artifacts as input, and produce artifacts as output. Here are some examples:

Test instantiation module takes a test configuration, and instantiates all tests that satisfy given conditions on the test weight or parameters
Apalache runner takes care of running Apalache for a particular test instance, and of processing the Apalache output
Test translator translates test artifacts from one format to another, e.g. from Apalache counterexample format to the LightClient test format
Caching module can wrap any other module, which execution takes a lot of time (e.g. Apalache), and cache the test artifacts it produces to save time on later invocations.

andrey-kuprianov commented 3 years ago

Test modules

A test module is a binary component that satisfies certain input-output relation, allowing to integrate it into the general MBT framework. Test modules are meant as primitive processing steps, transforming test artifacts into others.

Each module can provide a number of methods. The manifest method should always be present. When called without arguments, it should provide the manifest for module; when called with the name of the method, it should provide the manifest for that method.

Module manifest

Each test module has the associated manifest that includes:

module's name
description (optional)
semantic version
components that belong to this module, with versions for each one (optional)
list of methods that this module exposes

E.g. for the module that runs Apalache, the manifest could look like that:

{
  "name": "apalache",
  "description": "This module provides functions for interacting with the Apalache model checker", 
  "version": "0.1.0",
  "components": {
    "modelator": "0.1.0",
    "apalache": "0.10.0",
   "java": "openjdk-11.0.10"
  },
  "methods": [ 
     { 
        "name": "check-tla",
       ....
     },
     // same for "check-json", "test-tla", "test-json", "tla-to-json", "json-to-tla" 
  ]
}

Method manifest

Each module's method has the associated manifest that includes:

description (optional)
semantic version (optional)
inputs (input parameters)
results (outputs on success)
errors (outputs on error)

Each element of inputs, results, or errors should come with its type. The types are just strings, with the associated meaning that is unified across the MBT toolchain.

E.g. for the Apalache test-json method, the manifest could look like that:

{
  "name": "test-json",
  "description": "This method runs Apalache on the given model and configuration parameters in JSON format, and produces an example execution", 
  "version": "0.1.0",
  "inputs": {
      "model": "json-tla",
      "tla-config": "json-tla-config",
      "apalache-config": "json-apalache-config"
  },
  "results": {
      "execution": "json-tla-counterexample",
      "output": "string"
  },
  "errors": {
      "output": "string",
      "log": "string"
  }
}

Input and output

For the input-output behaviour of test modules we adopt the JSON-RPC protocol to guarantee future extensibility and the possibility of applications in wider contexts.

Each module needs to accept JSON-RPC requests via STDIN. Each module should accept at least the manifest method.

As a result of invocation, the module should produce a JSON-RPC response on STDOUT, with either the result or the error set, with the corresponding components outlined in the manifest. STDERR output can be arbitrary information, e.g. for debugging, which can be captured by the framework, but formally speaking it's ignored.

andrey-kuprianov commented 3 years ago

User interaction

We want test system interactions with the user to be fun. Therefore we will organise the system in a way that allows the user to explore and experiment. Imagine the testing system to be a quest game, or a universe, which the user wants to explore in order to find the bugs that hide in their code. To implement this view we will have the following:

A wide range of small building blocks: modules, such as transformations, translators, extenders, runners, etc., and artifacts, that are either user-created or produced as module applications;
Certain ways to connect those blocks, based on compatibility of their types;
The ways to explore and navigate among the blocks, but without the need to explicitly manipulate massive amounts of artifacts;
A primitive AI, which is able to produce hints on what the user might need to do in order to bridge between the blocks he want to connect.

How this is going to be implemented:

each block (module or artifact) includes a manifest, that contains both descriptive information (for navigation) as well as the type information (for composability);
The shadow tree structure of generated artifacts, which is not immediately visible in the file system, but can be explored via CLI commands. Imagine this to be a tree of Git objects;
Package with the system, and grow with time, a set of descriptions of certain transformation chains that prove to be feasible and useful. Hint the user on possible missing blocks on the way from his source to destination.

andrey-kuprianov commented 3 years ago

Communication of core with plugins

In the current instantiations of MBT one of the main limitations is the necessity to store in the repo and maintain a large number of static test files, like in LightClient MBT tests. We want to eliminate that and generate the files on the fly.

For that, MBT-core will take care of configuring the toolchain, and generating the tests. The language plugins, such s MBT-Rust, on the other hand will only execute the tests in the target language. The interaction between the core and the plugins should be done via a simple communication protocol, executed e.g. via a socket connection:

the core runs a daemon, that services requests; the daemon will also take care of caching the request results to faster service repetitive ones;
when triggered by the test system (e.g. cargo test), the plugin connects to the core, and requests the tests satisfying specific test vectors;
the core supplies the plugin with the tests;
the plugin executes the tests, shows the results in the test system, and reports the results back to the core;
the core stores the results.

Such structure simplifies MBT in several aspects:

eliminates the need to store excessive amounts of static tests
maximises the reusable part (MBT-core), and minimises the code of the plugins
opens the way to run MBT-core as an external service, providing the tests to multiple developers / multiple implementation languages.

andrey-kuprianov commented 3 years ago

Module requirements

Modules are the minimal processing steps that can be integrated into MBT. Here we list some requirements that MBT modules should satisfy (inspired by a resistance to #14):

Single-purpose module methods

A module should have methods that are single-purposed. E.g. in #14 there is RunMode option which offers a choice of whether to run a test, or to run an exploration. I believe that there should be two different methods of the tlc module, namely test and explore This will allow to relate to those individually, and to unambiguously integrate them into chains of modules.

Method determinism

See Reproducibility requirement above. I am not sure whether TLC with multiple workers is deterministic.

What we want to achieve is that module methods are like mathematical functions -- given the same version and same inputs, they will produce the same results. This will allow us not to store tests, but to generate them on the fly, and store only the TLA+ models and the minimal config describing what chain of modules to run to produce the tests.

Module calling conventions

14 replaces `moderator-cli` with `modelator`, which is in essence only the interface to run a model checker. I believe that instead it should be that TLC is a module, integrated into `modulator-core`, and callable via a generic module interface, dispatched from `moderator-cli`. It is still vague, but what we need to devise is approx. this:

Rust traits Module and Method, that will provide the way to call module's methods from Rust, by passing parameters.
For each of module's methods a separate Clap's options struct, that would allow to call the method from the command line. These option struct ideally should also allow to serialise/deserialize the method's parameters. See tendermint-testgen for an example of how this could be done; I believe we should be able to do the same with Clap
There should be a generic way to provide parameters to methods either "inline", i.e. as raw data (strings), or via a reference to the file
We need to have a generic way to accept modules written in whatever language if they follow the JSON-RPC conventions outlined above (but this is not for Q1)