zimmski / tavor

A generic fuzzing and delta-debugging framework
MIT License
245 stars 10 forks source link

Tavor GoDoc Build Status Coverage Status

Tavor (Sindarin for woodpecker) is a framework for easily implementing and using fuzzing and delta-debugging. Its EBNF-like notation allows you to define file formats, protocols, and other structured data without the need to write source code. Tavor relaxes on the definitions of fuzzing and delta-debugging to enable the usage of its algorithms universally for keyword-driven testing, model-based testing, simulating user-behavior and genetic programming. Tavor is also well-suited for researching new methods without reimplementing basic algorithms.

A quick example

We want to test a service which processes an XML structure. The structure can contain groups and items. A group contains other groups or items. An Item consists of an attribute name with an alphanumeric value. The item's value contains a number. This structure sounds simple but allows an enormous variety of possible outcomes. It is therefore hard to test since a tester has to think about every important possibility if the generation of the test data is done manually. Doing this manually is cumbersome and error-prone. Tavor can be used to automate the generation.

The described structure can be defined using the following Tavor format:

START = Entity |

Entity = Group | Item

Group = "<group>" +(Entity) "</group>"

Item = "<item name=\"" +([\w]) "\">" +([0-9]) "</item>"

This file called basic.tavor can be downloaded from here.

Tavor can then be used to fuzz the format by issuing the following command:

tavor --format-file basic.tavor fuzz

This command prints on every call a random generation of the given structure. Here are some example generations:

<item name="k">2</item>
<group><item name="kO">37</item></group>
<group><item name="V">37</item><group><item name="S">88</item></group></group>

Generating data like this is just one example of the capabilities of Tavor. Please have a look at the complete example with a broader overview over the basic features or keep reading to find out more about the background and capabilities of Tavor.

Additionally you can find official Tavor format files and fuzzer applications at https://github.com/zimmski/fuzzer. Announcements1 of the project's progress also include explanations, examples and roadmap information.

Table of content

What is fuzzing?

Fuzz testing or fuzzing is a software testing technique, often automated or semi-automated, that involves providing invalid, unexpected, or random data to the inputs of a computer program. The program is then monitored for exceptions such as crashes, or failing built-in code assertions or for finding potential memory leaks. Fuzzing is commonly used to test for security problems in software or computer systems.
-- https://en.wikipedia.org/wiki/Fuzz_testing

Although this is the common definition of fuzzing, it is nowadays often just one view on capabilities of fuzzing tools. In general, fuzzing is just the generation of data and it does not matter if it is invalid or valid and what the type of data (e.g. files, protocol data) is. The use case of the data itself is also often broadly defined as it can be used to test algorithms, programs or hardware but it can be practically used everywhere where data is needed.

Fuzzing algorithms can be categorized into the following two main categories:

Mutation-based fuzzing

Mutation-based fuzzing takes existing data and simply changes it. This often leads to invalid data, as most techniques are not obeying rules nor constraints of the underlying data.

Some common techniques for mutation-based fuzzing are:

Generation-based fuzzing

Generation-based algorithms have one big advantage over mutation-based algorithms in that they have to understand and obey the underlying rules and constraints of the data itself. This property can be used to generate valid as well as invalid data. Another property is that generation-based algorithms can generate data from scratch which eliminates the need for gathering data and keeping it up to date.

There are no common techniques for generation-based fuzzing but most algorithms choose a graph as underlying representation of the data model. The graph is then traversed and each node outputs a part of the data. Traversal algorithms and the complexity and abilities of the data model such as constraints between nodes or adding nodes during the traversal distinguish generation-based fuzzers and contributes in general to their mightiness.

What is delta-debugging?

The Delta Debugging algorithm isolates failure causes automatically - by systematically narrowing down failure-inducing circumstances until a minimal set remains.
-- https://en.wikipedia.org/wiki/Delta_Debugging

Data which fails the execution of a program can be reduced by using delta-debugging. The reduction of the data is handled by software heuristics (semi-)automatically. The obvious advantage of this method, besides being done (semi-)automatically, is that it is not necessary to handle unrelated parts of the data while debugging the problem. This makes it possible to focus on the important parts which actually lead to the failure.

Note: Literature and the computer science community do not agree on one common name for delta-debugging.

Besides delta-debugging the following names are fairly common for the same technique:

  • Since data is reduced it is also called reducing.
  • Since data is shrinking it is also called shrinking.

Delta-debugging consists of three areas:

Although delta-debugging is described as method to isolate failure causes, it can be also used to isolate anything given isolating constraints. For instance a solution may be reduced to its optimum e.g. smallest formula.

What does Tavor provide and how does it work?

Tavor combines both fuzzing and delta-debugging by allowing all implemented methods to operate on one internal model-based structure represented by a graph. This structure can be defined and generated by programming or by using a format file. Out of the box Tavor comes with its own format which covers all functionality of the framework.

Tavor's generic fuzzing implementation is not fixed to one technique. Instead different fuzzing techniques and heuristics can be implemented and executed independently as Tavor fuzzing strategies. The same principle is used for delta-debugging where so called Tavor reduce strategies can be implemented and used. Both types of strategies operate on the same internal structure independent of the format. This structure is basically a graph of nodes which are called tokens throughout the Tavor framework. The structure itself is not fixed to a static definition but can be changed by so called fuzzing filters to perform additional tasks like boundary-value analysis of ranges.

Even though Tavor provides a lot of functionality out of the box, it is not considered as complete. A list of missing but planed features can be found in the missing features section. For feature requests please have a look at the feature request section.

What are tokens?

Tavor's tokens differ from lexical analysis tokens in the following way. They represent not just a group of characters but different kinds of data with additional properties and abilities. Tokens can be constant integers and strings of all kind but also dynamic data like integer ranges, sequences and character classes. Furthermore, tokens can encapsulate other tokens to not only group them together but to create building blocks that can be reused to, for example, repeat a group of tokens. Tokens can have states, conditions and logic. They can create new tokens dynamically and can depend on other tokens to generate data. Tavor's tokens are basically the foundation of the framework and every algorithm for fuzzing, parsing and delta-debugging is using them.

If you want to know more about Tavor's tokens you can read through Tavor's format definition or you can read about them in depth in the developing and extending sections.

What are fuzzing strategies?

Each fuzzing strategy represents one fuzzing technique. This can be a heuristic for walking through the internal structure, how tokens of the structure are fuzzed or even both. Tavor currently distinguishes between two techniques of token fuzzing. One is to deterministically choose one possible permutation of the token the other is choosing randomly out of all permutations of a token.

An example for a fuzzing strategy is the random fuzzing strategy which is Tavor's default. This fuzzing strategy traverses through the whole internal structure and randomly permutates each token.

Please have a look at the documentation for an overview of all officially available fuzzing strategies of Tavor.

What are fuzzing filters?

Fuzzing filters mutate the internal structure and can be applied after the structure is ready for fuzzing thus after creating it e.g. after parsing and unrolling. This can be associated to mutation-based fuzzing where not the generating structure but the data itself is mutated.

An example use-case for fuzzing filters is the boundary-value analysis software testing technique. Imagine a function which should be tested having one integer parameter. The parameter's valid values range from 1 to 100. This would lead to 100 possible values which have to be tested just for this one integer and thus to at least 100 permutations of the internal structure. Boundary-value analysis reduces these permutations to e.g. 1, 50 and 100 so just three instead of 100 cases. This is exactly what the PositiveBoundaryValueAnalys is fuzzing filter does. This fuzzing filter traverses the whole internal structure and replaces every range token with at most five boundary values.

Please have a look at the documentation for an overview of all officially available fuzzing filters of Tavor.

What are reduce strategies?

Reduce strategies are strongly comparable to fuzzing strategies. Each reduce strategy represents one reduce/delta-debugging technique. This can be a heuristic for walking through the internal structure, how tokens of the structure are reduced or even both. The reduction method is depending on the token type. For example a constant integer cannot be reduced any further but a repetition of optional strings can be minimized or even left out.

Please have a look at the documentation for an overview of all officially available reduce strategies of Tavor.

Why are loops unrolled?

Although the internal structure allows loops in its graph, Tavor currently unrolls loops for easier algorithm implementations and usage. A future version will supplement this by allowing loops.

This graph for example loops between the states Idle and Action:

Looping

Unrolling this example results in the following graph given a maximum of two repetitions:

Unrolled

The Tavor format

The Tavor format documentation has its own page which can be found here.

A complete example for fuzzing, executing and delta-debugging

The complete example has its own page which can be found here.

Where are the precompiled binaries?

You can find all precompiled binaries on the release page. Currently only 32 and 64 bit Linux binaries are provided. Other architectures are currently not supported, but might work. Please have a look at the feature request section if you need them to work or if you want more binaries for different architectures.

How do I build Tavor?

If you do not want to use the precompiled binaries but instead want to compile Tavor from scratch, just follow the these steps:

Note: All steps must execute without any errors.

  1. Install and configure Go

    At least version 1.4 must be used. Your distribution will most definitely have some packages or you can be brave and just install it yourself. Have a look at the official documentation. Good luck!

  2. Go-get Tavor

    go get github.com/zimmski/tavor/
  3. Make Tavor

    cd $GOPATH/src/github.com/zimmski/tavor
    make

You now have a binary tavor in your $GOPATH/bin folder (or if set $GOBIN folder) which can be used without any further actions.

How do I use Tavor?

Tavor can be used in three different ways:

The Tavor binary

The Tavor binary provides fuzzing and delta-debugging functionality for Tavor format files as well as some other commands. Sane default arguments should provide a pleasant experience.

Since the binary acts on Tavor format files, the --format-file argument has to be used for every non-informational action. E.g. the following command fuzzes the given format file with the default fuzzing strategy:

tavor --format-file file.tavor fuzz

In contrast listing all available fuzzing strategies does not require the --format-file argument:

tavor fuzz --list-strategies

To learn more about available arguments and commands, you can invoke the help by executing the binary without any arguments or with the --help argument.

Here is a complete overview of all arguments, commands and their options:

Usage:
  tavor [options] <command> [command options]

General options:
  --debug             Debug log output
  --help              Show this help message
  --verbose           Verbose log output
  --version           Print the version of this program

Global options:
  --seed=             Seed for all the randomness
  --max-repeat=       How many times loops and repetitions should be repeated (2)

Format file options:
  --check             Just check the syntax of the format file and exit
  --format-file=      Input tavor format file
  --print             Prints the AST of the parsed format file
  --print-internal    Prints the internal AST of the parsed format file

Available commands:
  fuzz      Fuzz the given format file
  graph     Generate a DOT file out of the internal AST
  reduce    Reduce the given input file
  validate  Validate the given input file

[fuzz command options]
      --exec=                                    Execute this binary with possible arguments to test a generation
      --exec-exact-exit-code=                    Same exit code has to be present (-1)
      --exec-exact-stderr=                       Same stderr output has to be present
      --exec-exact-stdout=                       Same stdout output has to be present
      --exec-match-stderr=                       Searches through stderr via the given regex. A match has to be present
      --exec-match-stdout=                       Searches through stdout via the given regex. A match has to be present
      --exec-do-not-remove-tmp-files             If set, tmp files are not removed
      --exec-do-not-remove-tmp-files-on-error    If set, tmp files are not removed on error
      --exec-argument-type=                      How the generation is given to the binary (stdin)
      --list-exec-argument-types                 List all available exec argument types
      --script=                                  Execute this binary which gets fed with the generation and should return feedback
      --exit-on-error                            Exit if an execution fails
      --filter=                                  Fuzzing filter to apply
      --list-filters                             List all available fuzzing filters
      --strategy=                                The fuzzing strategy (random)
      --list-strategies                          List all available fuzzing strategies
      --result-folder=                           Save every fuzzing result with the MD5 checksum as filename in this folder
      --result-extension=                        If result-folder is used this will be the extension of every filename
      --result-separator=                        Separates result outputs of each fuzzing step ("\n")

[graph command options]
      --filter=         Fuzzing filter to apply
      --list-filters    List all available fuzzing filters

[reduce command options]
      --exec=                           Execute this binary with possible arguments to test a generation
      --exec-exact-exit-code            Same exit code has to be present
      --exec-exact-stderr               Same stderr output has to be present
      --exec-exact-stdout               Same stdout output has to be present
      --exec-match-stderr=              Searches through stderr via the given regex. A match has to be present
      --exec-match-stdout=              Searches through stdout via the given regex. A match has to be present
      --exec-do-not-remove-tmp-files    If set, tmp files are not removed
      --exec-argument-type=             How the generation is given to the binary (stdin)
      --list-exec-argument-types        List all available exec argument types
      --script=                         Execute this binary which gets fed with the generation and should return feedback
      --input-file=                     Input file which gets parsed, validated and delta-debugged via the format file
      --strategy=                       The reducing strategy (Linear)
      --list-strategies                 List all available reducing strategies
      --result-separator=               Separates result outputs of each reducing step ("\n")

[validate command options]
      --input-file=   Input file which gets parsed and validated via the format file

General options

The Tavor binary provides different kinds of general options. These are informative or may be applied to other commands. Besides the --format-file general format option the following are noteworthy:

Please have a look at the help for more options and descriptions:

tavor --help

Command: fuzz

The fuzz command generates data using the given format file and prints it directly to STDOUT.

tavor --format-file file.tavor fuzz

By default the random fuzzing strategy is used which can be altered using the --strategy fuzz command option.

tavor --format-file file.tavor fuzz --strategy AllPermutations

Fuzzing filters can be applied before the fuzzing generation by using the --filter fuzz command option. Filters are applied in the same order as they are defined, meaning from left to right.

The following command will apply the PositiveBoundaryValueAnalysis fuzzing filter and then the NegativeBoundaryValueAnalysis:

tavor --format-file file.tavor fuzz --filter PositiveBoundaryValueAnalysis --filter NegativeBoundaryValueAnalysis

Alternatively to printing to STDOUT an executable (or script) can be fed with the generated data. You can find examples for executables and scripts here.

There are two types of arguments to execute commands:

--result-* is an additional fuzz command option kind which can be used to influence the fuzzing generation itself. For example the --result-separator fuzz command option changes the separator of the generations if they are printed to STDOUT. The following command will use @@@@ instead of the default \n separator to feed the fuzzing generations to the running process:

tavor --format-file file.tavor fuzz --script validate --result-separator "@@@@"

Please have a look at the fuzz command help for more options and descriptions:

tavor --help fuzz

Command: graph

The graph command prints out a graph of the internal structure. This is needed since textual formats like the Tavor format can be often difficult to mentally visualize. Currently only the DOT format is supported therefore third-party tools like Graphviz have to be used to convert the DOT data to other formats like JPEG, PNG or SVG.

The following command prints the DOT graph of a format file to STDOUT:

tavor --format-file file.tavor graph

To save the graph to a file the output of the command can be simply redirected:

tavor --format-file file.tavor graph > file.dot

The output may be piped directly into the dot command of Graphviz:

tavor --format-file file.tavor graph | dot -Tsvg -o outfile.svg

Please have a look at the graph command help for more options and descriptions:

tavor --help graph

To define the graph notation, the following image will be explained:

Graph notation

Command: reduce

The reduce command applies delta-debugging to a given input according to the given format file. The reduction generates reduced generations of the original input which have to be tested either by the user or a program. Every generation has to correspond to the given format file which implies that the original input has to be valid too. This is validated using the same mechanisms as used by the validate command.

By default the reduction generation is printed to STDOUT and feedback is given through STDIN.

tavor --format-file file.tavor reduce --input-file file.input

By default the Linear reduce strategy is used which can be altered using the --strategy reduce command option.

tavor --format-file file.tavor reduce --input-file file.input --strategy random

Alternatively to printing to STDOUT an executable (or script) can be fed with the generated data. You can find examples for executables and scripts here.

There are two types of arguments to execute commands:

--result-* is an additional reduce command option kind which can be used to influence the reduce generation itself. For example the --result-separator reduce command option changes the separator of the generations if they are printed to STDOUT. The following command will use @@@@ instead of the default \n separator to feed the reduce generations to the running process:

tavor --format-file file.tavor reduce --input-file file.input --script validate --result-separator "@@@@"

Please have a look at the reduce command help for more options and descriptions:

tavor --help reduce

Command: validate

The validate command validates a given input file according to the given format file. This can be helpful since this is for instance needed for the reduce command which does apply delta-debugging only on valid inputs or in the general case it can be used to validate an input which was not generated through the given format file.

tavor --format-file file.tavor validate --input-file file.input

Please have a look at the validate command help for more options and descriptions:

tavor --help validate

Bash Completion

If you like Bash Completion for Tavor make sure that you have Bash Completion installed and then copy the bash completion Tavor script into your Bash Completion folder.

mkdir -p $HOME/.bash_completion
wget -P $HOME/.bash_completion https://raw.githubusercontent.com/zimmski/tavor/master/cmd/tavor/tavor-bash_completion.sh
. ~/.bashrc

Bash Completion for Tavor should now be working. If not, one reason could be that your distribution does not include user defined Bash Completion scripts in .bashrc so just add it to your .bashrc and include the .bashrc in your current bash:

echo ". ~/.bash_completion/tavor-bash_completion.sh" >> ~/.bashrc
. ~/.bashrc

How do I develop applications with the Tavor framework?

The Tavor format does currently not allow to add source code as attributes to tokens. This prevents the implementation of many use-cases. One example is dynamically adding tokens according to a data source e.g. a database or the outcome of static code analysis. To work around this limitations, the Tavor framework can be used directly using Go code. This makes it also possible to create binaries which are independent of the Tavor binary.

All main components of the Tavor framework as well as lots of helper functions are exported by the respective packages which are broadly described in the following subsections as well as in the source code documentation. It is also advisable to read the source code of the packages, official fuzzers and delta-debuggers at https://github.com/zimmski/fuzzer, the Tavor binary and of course the complete example of the Tavor documentation.

Token structures GoDoc

All operations of the Tavor framework are applied to token structures which can be either made using the Tavor format or manually by instantiating tokens. Officially implemented tokens can be found in their respective packages which are grouped by their usage type.

All tokens have a New* function which can be used to instantiate tokens with sane default values. For example a constant string can be created with the following code.

tok := primitives.NewConstantString("text")

A sequence of tokens can be for example created with the following code.

tok := lists.NewConcatenation(
    primitives.NewConstantString("This is example number "),
    primitives.NewConstantInt(2),
    primitives.NewConstantString(" which is still not the last one.")
)

It is also possible to create token structures using the Tavor format. The parser package exports the function ParseTavor which reads and parses a Tavor formatted input. It returns the root of the token representation on success. The previous example can be rewritten using the following code.

tok, err := parser.ParseTavor(strings.NewReader(`
    START = "This is example number " 2 " which is still not the last one."
`))

Every token implements at least the Token interface which specifies basic methods for generating, replicating, permutating and parsing. Other token interfaces add specific functionality to a token. The List interface for example states that a token can have internal and external child tokens and specifies methods to access them.

More information regarding tokens can be found in the extending section.

Fuzzing filters GoDoc

All officially implemented fuzzing filters can be found in the github.com/zimmski/tavor/fuzz/filter package. Each filter has a New* function which can be used to instantiate a new instance of the filter with sane default values. Additionally all filters have to implement the Filter interface which specifies a generic function that applies the filter onto one given token. However, the common case is to use the ApplyFilters function which applies one or more filters onto a whole token structure. It also handles loops in the structure and the replacement of tokens.

var filters = []filter.Filter{
    filter.NewPositiveBoundaryValueAnalysis,
}

tok, err := filter.ApplyFilters(filters, tok)
if err != nil {
    panic(err)
}

More information regarding fuzzing filters can be found in the extending section.

Fuzzing strategies GoDoc

All officially implemented fuzzing strategies can be found in the github.com/zimmski/tavor/fuzz/strategy package. Each strategy has a New* function which can be used to instantiate a new instance of the strategy with sane default values. Additionally all strategies have to implement the Strategy interface which specifies a Fuzz method that applies the strategy onto a token structure. The following example uses the AllPermutations strategy to permutate over the given token.

import (
  "fmt"

  "github.com/zimmski/tavor/fuzz/strategy"
  "github.com/zimmski/tavor/token"
  "github.com/zimmski/tavor/token/lists"
  "github.com/zimmski/tavor/token/primitives"
)

func main() {
  var tok token.Token = lists.NewOnce(
    primitives.NewConstantInt(1),
    primitives.NewConstantInt(2),
    primitives.NewConstantInt(3),
  )

  continueFuzzing, err := strategy.NewAllPermutations(tok, nil)
  if err != nil {
    panic(err)
  }

  for i := range continueFuzzing {
    fmt.Println(tok.String())

    continueFuzzing <- i
  }
}

This program has the following output.

123
132
213
231
312
321

More information regarding fuzzing strategies can be found in the extending section.

Reduce Strategies GoDoc

All officially implemented reduce strategies can be found in the github.com/zimmski/tavor/reduce/strategy package. Each strategy has a New* function which can be used to instantiate a new instance of the strategy with sane default values. Additionally all strategies have to implement the Strategy interface which specifies a Reduce method that applies the strategy onto a token structure. The following example uses the Linear strategy to reduce a given token.

import (
    "fmt"

    "github.com/zimmski/tavor/reduce/strategy"
    "github.com/zimmski/tavor/token/lists"
    "github.com/zimmski/tavor/token/primitives"
)

func main() {
  tok := lists.NewRepeat(primitives.NewConstantString("a"), 0, 100)
  tok.Permutation(19)

  fmt.Println(tok.String())

  continueFuzzing, feedbackReducing, err := strategy.NewLinear(tok)
  if err != nil {
    panic(err)
  }

  for i := range continueFuzzing {
    out := tok.String()

    fmt.Println(out)

    if len(out) == 5 {
      feedbackReducing <- strategy.Good
    } else {
      feedbackReducing <- strategy.Bad
    }

    continueFuzzing <- i
  }
}
}

More information regarding reduce strategies can be found in the extending section.

How do I extend the Tavor framework?

If the Tavor format and the implemented functionality of the framework is not enough to implement your applications needs, you can easily extend and change the Tavor framework. The following sections will provide starting points, hints and conventions to help you write your own Tavor extensions like fuzzing and reduce strategies or even your own tokens.

Since implementing new extensions and doing changes is trickier than using the existing framework, it is advisable to read the code documentation, which can be found in a nice representation on https://godoc.org/github.com/zimmski/tavor/, and of course the actual code.

Code for the Tavor framework has to be deterministic. This means that no functionality is allowed to have its own source or seed of randomness. Methods of interfaces that define a random generator have to be implemented deterministically so that the same random seed will always result in the same result. This also applies to manually written tests and code that is concurrent.

If you are aiming to get your extensions and changes offically incorporated into the Tavor framework, please first create an issue in the issue tracker and discuss your implementation goals and plans with an outline. Note that every feature and change has to be tested with handwritten tests, so please include a test plan in your outline too.

If extending Tavor yourself is not for you, but you still need new features, you can take a look at the feature request section.

Fuzzing filters GoDoc

The fuzzing filter code and all officially implemented fuzzing filters can be found in the github.com/zimmski/tavor/fuzz/filter package and its sub-packages.

A fuzzing filter has to implement the Filter interface which is exported by the github.com/zimmski/tavor/fuzz/filter package. The interface defines a function signature that applies the filter onto a token which is passed to the function. Therefore, the function's concern is only one token at a time. If an error is encountered during the filter execution, the error return argument is not nil. On success a replacement for the token is returned. If this replacement is not nil, it will replace the original token.

Applying a filter can be done manually or using the ApplyFilters function exported by the github.com/zimmski/tavor/fuzz/filter package. ApplyFilters applies more than one filter, correctly traverses the graph, handles errors of filters and does not apply filters onto filter generated tokens. The last property is needed to avoid filter loops e.g. when two filter generate new tokens which trigger the generation of the other filter.

The Register function of the github.com/zimmski/tavor/fuzz/filter package allows to register filters based on an identifier which can be then used within the framework. The function New of the github.com/zimmski/tavor/fuzz/filter package allows to generate a new instance of the registered filter given the identifier. For example, this is needed for the Tavor binary, which applies filters defined by CLI arguments.

Examples

The following fuzzing filter searches the token graph for constant string tokens which hold the string "old" and replaced them with a constant string token holding the string "new".

import (
    "github.com/zimmski/tavor/token"
    "github.com/zimmski/tavor/token/primitives"
)

func NewSampleFilter(tok token.Token) (token.Token, error) {
    c, ok := tok.(*primitives.ConstantString)
    if !ok || c.String() != "old" {
        return nil, nil
    }

    return primitives.NewConstantString("new"), nil
}

One way to apply this filter is by using the following code. Which will change the generation from "old string" to "new string" after the filter is applied.

import (
    "fmt"

    "github.com/zimmski/tavor/fuzz/filter"
    "github.com/zimmski/tavor/token"
    "github.com/zimmski/tavor/token/lists"
    "github.com/zimmski/tavor/token/primitives"
)

func main() {
    var doc token.Token = lists.NewConcatenation(
        primitives.NewConstantString("old"),
        primitives.NewConstantString(" "),
        primitives.NewConstantString("string"),
    )

    var filters = []filter.Filter{
        NewSampleFilter,
    }

    doc, err := filter.ApplyFilters(filters, doc)
    if err != nil {
        panic(err)
    }

    fmt.Println(doc.String())
}

The filter can be registered as a framework-wide usable filter using the following code. Note that this should be usually done in an init function inside the package of a filter.

import (
    "github.com/zimmski/tavor/fuzz/filter"
)

func init() {
    filter.Register("SampleFilter", NewSampleFilter)
}

Fuzzing strategies GoDoc

The fuzzing strategy code and all officially implemented fuzzing strategies can be found in the github.com/zimmski/tavor/fuzz/strategy package and its sub-packages.

Each fuzzing strategy instance has to be associated on construction with exactly one token. This allows an instance to hold a dedicated state of the given token graph, which makes optimizations for multiple fuzzing operations possible.

A fuzzing strategy has to implement the Strategy interface which is exported by the github.com/zimmski/tavor/fuzz/strategy package. The interface defines a function which starts the first iteration of the fuzzing strategy in a new goroutine and returns a channel which controls the fuzzing process. If an error is encountered during the initialization, the error return argument is not nil. On success a value is returned by the channel which marks the completion of the iteration. A value has to be put back in, to initiate the calculation of the next fuzzing iteration. This passing of values is needed to avoid data races within the token graph. The channel must be closed when there are no more iterations or the strategy caller wants to end the fuzzing process. Note that this can also occur right after receiving the channel. Hence when there are no iterations at all. Since the function is running in its own goroutine, it can be implemented statefully without using savepoints.

The Register function of the github.com/zimmski/tavor/fuzz/strategy package allows to register strategies based on an identifier which can be then used within the framework. The function New of the github.com/zimmski/tavor/fuzz/strategy package allows to generate a new instance of the registered strategy given the identifier. For example, this is needed for the Tavor binary, which can execute a specific strategy defined by a CLI argument.

Examples

The following fuzzing strategy searches the token graph for constant integer tokens which have a value within 1 and 10 and increments their content by replacing the original value. This strategy falls therefore in the category of mutation-based fuzzing, since it does change the original data. It is also stateless since there is no need to keep track of current events between iterations. The graph is simply searched and changed once per iteration. An additional property is that the defined operation allows the strategy to end, which is strictly not necessary.

Note: The search of tokens could be cached, which would give the fuzzing strategy instance a kind of state. It would strictly speaking still not be a stateful fuzzing strategy, since there is no connection between iterations. Meaning no iteration depends on a previous iteration. Caching tokens would also imply that changes in the graph, which could be made outside the fuzzing go routing, must be handled or a contract has to be made between the caller and callee of the fuzzing strategy.

import (
    "github.com/zimmski/tavor/rand"
    "github.com/zimmski/tavor/token"
    "github.com/zimmski/tavor/token/primitives"
)

func NewSampleStrategy(root token.Token, r rand.Rand) (chan struct{}, error) {
    continueFuzzing := make(chan struct{})

    go func() {
        for {
            found := false

            err := token.Walk(root, func(tok token.Token) error {
                intTok, ok := tok.(*primitives.ConstantInt)
                if !ok {
                    return nil
                }

                v := intTok.Value()
                if v >= 1 && v <= 10 {
                    found = true

                    v++

                    intTok.SetValue(v)
                }

                return nil
            })
            if err != nil {
                panic(err)
            }

            if !found {
                break
            }

            continueFuzzing <- struct{}{}

            if _, ok := <-continueFuzzing; !ok {
                // the fuzzing strategy caller can close the channel too

                return
            }
        }

        close(continueFuzzing)
    }()

    return continueFuzzing, nil
}

One way to execute this strategy is by using the following code. Note that the implemented strategy does not need a random generator. The argument for the function can be therefore just nil.

import (
    "fmt"

    "github.com/zimmski/tavor/token"
    "github.com/zimmski/tavor/token/lists"
    "github.com/zimmski/tavor/token/primitives"
)

func main() {
    var doc token.Token = lists.NewConcatenation(
        primitives.NewConstantInt(7),
        primitives.NewConstantString(" "),
        primitives.NewConstantInt(9),
    )

    continueFuzzing, err := NewSampleStrategy(doc, nil)
    if err != nil {
        panic(err)
    }

    for i := range continueFuzzing {
        fmt.Println(doc.String())

        continueFuzzing <- i
    }
}

This program results in the following output.

8 10
9 11
10 11
11 11

The strategy can be registered as a framework-wide usable strategy using the following code. Note that this should be usually done in an init function inside the package of a strategy.

import (
    "github.com/zimmski/tavor/fuzz/strategy"
    "github.com/zimmski/tavor/token"
)

func init() {
    strategy.Register("SampleStrategy", NewSampleStrategy)
}

Reduce strategies GoDoc

The reduce strategy code and all officially implemented reduce strategies can be found in the github.com/zimmski/tavor/reduce/strategy package and its sub-packages.

Each reduce strategy instance has to be associated on construction with exactly one token. This allows an instance to hold a dedicated state of the given token graph, which makes optimizations for multiple reduce operations possible.

A reduce strategy has to implement the Strategy interface which is exported by the github.com/zimmski/tavor/reduce/strategy package. The interface defines a function which starts the first step of the reduce strategy in a new goroutine and returns two channels to control the reduce process. If an error is encountered during the initialization, the error return argument is not nil. On success a value is returned by the control channel which marks the completion of the iteration. A feedback has to be given through the feedback channel as well as a value to the control channel to initiate the calculation of the next reduce step. This passing of values is needed to avoid data races within the token graph. The channels must be closed when there are no more steps or the strategy caller wants to end the reduce process. Note that this can also occur right after receiving the channels. Hence when there are no steps at all. Since the function is running in its own goroutine, it can be implemented statefully without using savepoints.

Currently only two different feedback answers can be given. They are defined by the ReduceFeedbackType type which is exported by the github.com/zimmski/tavor/reduce/strategy package. One feedback answer is Good which communicates to the reduce strategy that the current step produced a successful result. This can mean for example that the result has the right syntax or is better than the last good result. The meaning of the feedback and the response of the strategy to the feedback are purely dependent on the application. Responses could be for example to proceed with a given optimization path or to simply end the whole reducing process, since it is often enough to find one solution. The second feedback answer is Bad which communicates exactly the opposite of Good to the strategy. This answer is often more complicated to handle since it means that in some scenarios a revert of the current step to the last good step has to occur before the reduce process can continue.

Note: All reduce strategies should currently implement algorithms that produce valid generations according to the internal token graph. Hence a constant integer should not for example be replaced by a constant string. This is a convention which is not enforced but highly recommended to avoid problems until it is safely supported by a future version of Tavor.

The Register function of the github.com/zimmski/tavor/reduce/strategy package allows to register strategies based on an identifier which can be then used within the framework. The function New of the github.com/zimmski/tavor/reduce/strategy package allows to generate a new instance of the registered strategy given the identifier. For example, this is needed for the Tavor binary, which can execute a specific strategy defined by a CLI argument.

Examples

The following reduce strategy searches the token graph for repeat tokens holding one constant string token as its internal token and reduces the repetition by one token for every reduce step. This could be done more efficiently but it is very simple and should demonstrate that reduce strategies can be implemented very easily to start with. It is a stateful strategy since every step depends on the previous one. Additionally it is guaranteed to end, since only a finite amount of tokens is targeted without generating new ones. TODO this example

import (
    "errors"

    "github.com/zimmski/tavor/reduce/strategy"
    "github.com/zimmski/tavor/token"
    "github.com/zimmski/tavor/token/lists"
    "github.com/zimmski/tavor/token/primitives"
)

func NewSampleStrategy(root token.Token) (chan struct{}, chan<- strategy.ReduceFeedbackType, error) {
    continueReducing := make(chan struct{})
    feedbackReducing := make(chan strategy.ReduceFeedbackType)

    go func() {
        done := errors.New("done")

        err := token.Walk(root, func(tok token.Token) error {
            repeat, ok := tok.(*lists.Repeat)
            if !ok || repeat.InternalLen() != 1 {
                return nil
            }

            char, err := repeat.InternalGet(0)
            if err != nil {
                return err
            }
            if _, ok := char.(*primitives.ConstantString); !ok {
                return nil
            }

            for i := repeat.Reduces() - 1; i >= 0; {
                found := false
                l := len(repeat.String())
                for {
                    if err := repeat.Reduce(i); err != nil {
                        return err
                    }

                    if l-1 == len(repeat.String()) {
                        found = true

                        break
                    }

                    if i == 0 {
                        break
                    }

                    i--
                }

                if !found {
                    break
                }

                continueReducing <- struct{}{}

                feedback, ok := <-feedbackReducing
                if !ok {
                    return nil
                }

                if _, ok := <-continueReducing; !ok {
                    return nil
                }

                if feedback == strategy.Good {
                    return done
                }
            }

            return nil
        })
        if err != nil && err != done {
            panic(err)
        }

        close(continueReducing)
        close(feedbackReducing)
    }()

    return continueReducing, feedbackReducing, nil
}

One way to execute this strategy is by using the following code.

import (
    "fmt"

    "github.com/zimmski/tavor/reduce/strategy"
    "github.com/zimmski/tavor/token"
    "github.com/zimmski/tavor/token/lists"
    "github.com/zimmski/tavor/token/primitives"
)

func main() {
    aRepeat := lists.NewRepeat(primitives.NewConstantString("a"), 0, 100)
    aRepeat.Permutation(6)
    bRepeat := lists.NewRepeat(primitives.NewConstantString("b"), 1, 100)
    bRepeat.Permutation(4)
    cRepeat := lists.NewRepeat(primitives.NewConstantString("c"), 7, 100)
    cRepeat.Permutation(8)
    dRepeat := lists.NewRepeat(primitives.NewConstantString("d"), 1, 100)
    dRepeat.Permutation(1)

    var doc token.Token = lists.NewConcatenation(
        aRepeat,
        bRepeat,
        cRepeat,
        dRepeat,
    )

    fmt.Println(doc.String())

    continueFuzzing, feedbackReducing, err := NewSampleStrategy(doc)
    if err != nil {
        panic(err)
    }

    for i := range continueFuzzing {
        out := doc.String()

        fmt.Println(out)

        if len(out) <= 10 {
            feedbackReducing <- strategy.Good
        } else {
            feedbackReducing <- strategy.Bad
        }

        continueFuzzing <- i
    }
}

This program results in the following output.

aaaaaabbbbbcccccccccccccccdd
aaaaabbbbbcccccccccccccccdd
aaaabbbbbcccccccccccccccdd
aaabbbbbcccccccccccccccdd
aabbbbbcccccccccccccccdd
abbbbbcccccccccccccccdd
bbbbbcccccccccccccccdd
bbbbcccccccccccccccdd
bbbcccccccccccccccdd
bbcccccccccccccccdd
bcccccccccccccccdd
bccccccccccccccdd
bcccccccccccccdd
bccccccccccccdd
bcccccccccccdd
bccccccccccdd
bcccccccccdd
bccccccccdd
bcccccccdd

The strategy can be registered as a framework-wide usable strategy using the following code. Note that this should be usually done in an init function inside the package of a strategy.

import (
    "github.com/zimmski/tavor/reduce/strategy"
    "github.com/zimmski/tavor/token"
)

func init() {
    strategy.Register("SampleStrategy", NewSampleStrategy)
}

Tokens GoDoc

Tokens are the building blocks of the Tavor framework and format. All tokens have to implement the Token interface which defines basic methods to generate, replicate, permutate and parse a token instance. Additional functionality is defined by token interfaces like the List interface and the Forward interface. All token interfaces are used throughout the Tavor framework. It is therefore advised to read the documentation of the different interfaces.

Every token implementation should differentiate between the internal and external representation. A token with a choice like a range integer can hold many values but only one at any given time. It is therefore necessary to save the range internally but forward the current value externally. The differentiation is especially important for token types who generate new tokens out of their internal ones. The original internal tokens should not be connected to the external ones since it is otherwise possible to change the internal representation without any contract.

Token interface GoDoc

The Token interface is the base interface of all tokens. Its methods can be grouped into the following categories.

The Token documentation describes all needed methods and their needed functionality.

Examples

The following token defines a smiley which has eyes, a mouth and can have a nose. The token is able to permutate different generations of smileys and even parse them. The code will be described in groups of method categories. The example should in general show how easy it is to create new token types. It must be noted that the example could be easily implemented with the available token types or with the following Tavor format.

START = [:;] ?("-") [)(D]

Since the Smiley token has to hold three different information, it is necessary to create a struct. Instead of directly using the runes for the eyes and mouth in the struct, only indexes are used. This is not necessary but is a good convention to separate the data from its source.

import (
    "github.com/zimmski/tavor/rand"
    "github.com/zimmski/tavor/token"
)

var (
    eyes   = []rune{':', ';'}
    mouths = []rune{')', '(', 'D'}
)

type Smiley struct {
    eyes  int
    nose  bool
    mouth int
}

For easier construction the function NewSmiley is defined which sets default values for all members of the Smiley struct. Note that even tough 0 and false are the initialization values of the used types and could be therefore left out, it is more important to explicitly declare what the default values of a token should look like.

func NewSmiley() *Smiley {
    return &Smiley{
        eyes:  0,
        nose:  false,
        mouth: 0,
    }
}

The struct must implement the Token interface which is grouped into method categories. The Generation category deals with the output of the current value. This is used to output generations of tokens. Currently only the String method is necessary for the Token interface.

func (s *Smiley) String() string {
    nose := ""

    if s.nose {
        nose = "-"
    }

    return string(eyes[s.eyes]) + nose + string(mouths[s.mouth])
}

The Replication category deals with replicating the token. This is done by the Clone method. Note that the new token must be independent. This means that token internal slices, maps and other allocated structures must be copied as well.

func (s *Smiley) Clone() token.Token {
    return &Smiley{
        eyes:  s.eyes,
        nose:  s.nose,
        mouth: s.mouth,
    }
}

The Permutation category generates distinct permutations of a token. The method Permutations defines how many permutations a single token holds. The Smiley token has a constant number of permutations since the amount of eyes and mouths is constant. Other token like range integers depend on their initial values. The method PermutationsAll calculates the permutations of the token itself and all its children. Since the Smiley token has no children it is the same as Permutations. It is important to note that calculating the amount of permutations is not a straightforward task. The list tokens Concatenation and One for example can have the same amount of children but have very different permutation calculations. The Permutation method completes the category. It sets a distinct permutation of the token. It is a good convention to put the execution of the permutation in its own method permutation since the resulting state can be cached. The Permutation of the Token interface then handels the validation and meta-handling of the permutation number.

func (s *Smiley) Permutations() uint {
    return uint(len(eyes) * 2 * len(mouths))
}

func (s *Smiley) PermutationsAll() uint {
    return s.Permutations()
}

func (s *Smiley) Permutation(i uint) error {
    permutations := s.Permutations()

    if i >= permutations {
        return &token.PermutationError{
            Type: token.PermutationErrorIndexOutOfBound,
        }
    }

    s.permutation(i)

    return nil
}

func (s *Smiley) permutation(i uint) {
    p := uint(0)

    // This could be done more efficiently but to keep the example simple it traverses over every permutation to find the right one.
OUT:
    for eyes := range eyes {
        for _, nose := range []bool{false, true} {
            for mouth := range mouths {
                if i == p {
                    s.eyes = eyes
                    s.nose = nose
                    s.mouth = mouth

                    break OUT
                }

                p++
            }
        }
    }
}

Finally the Parsing category which deals with parsing the token from an input. This looks very verbose but it is necessary to handle every syntax error to generate adequate parsing errors. Note that these messages could be further improved by not only stating that something was expected but actually giving examples on what was expected.

func (s *Smiley) Parse(pars *token.InternalParser, cur int) (int, []error) {
    // Validate if there is room for at least a smiley without a nose
    if cur+2 > pars.DataLen {
        return cur, []error{&token.ParserError{
            Message: "No room for a smiley",
            Type:    token.ParseErrorUnexpectedEOF,
        }}
    }

    found := false
    for i, eyes := range eyes {
        if rune(pars.Data[cur]) == eyes {
            s.eyes = i

            found = true

            break
        }
    }
    if !found {
        return cur, []error{&token.ParserError{
            Message: "Expected smiley eyes",
            Type:    token.ParseErrorUnexpectedData,
        }}
    }

    cur++

    if pars.Data[cur] == '-' {
        s.nose = true

        cur++
    } else {
        s.nose = false
    }

    // Validate if there is room for the smiley mouth
    if cur > pars.DataLen {
        return cur, []error{&token.ParserError{
            Message: "No room for the smiley mouth",
            Type:    token.ParseErrorUnexpectedEOF,
        }}
    }

    found = false
    for i, mouth := range mouths {
        if rune(pars.Data[cur]) == mouth {
            s.mouth = i

            found = true

            break
        }
    }
    if !found {
        return cur, []error{&token.ParserError{
            Message: "Expected smiley mouth",
            Type:    token.ParseErrorUnexpectedData,
        }}
    }

    cur++

    return cur, nil
}

The Smiley token can then be used to create structures like with any other token of the framework. However. to give an easier example, it will be used alone. The next program will create a new Smiley token, permutate over all permutations and parses a string. Each step is printed to STDOUT.

import (
    "fmt"

    "github.com/zimmski/tavor/token"
)

func main() {
    s := NewSmiley()
    fmt.Println("New:", s.String())

    fmt.Print("Permutations:")
    for i := uint(0); i < s.Permutations(); i++ {
        s.Permutation(i)
        fmt.Print(" ", s.String())
    }
    fmt.Println()

    p := &token.InternalParser{}
    p.Data = ":-D"
    p.DataLen = len(p.Data)

    _, errs := s.Parse(p, 0)
    if errs != nil {
        panic(errs)
    }

    fmt.Println("Parsed:", s.String())
}

This program results in the following output.

New: :)
Permutations: :) :( :D :-) :-( :-D ;) ;( ;D ;-) ;-( ;-D
Parsed: :-D

Token attributes

Every token type and interface can have its own token attributes. Currently it is not possible to define these attributes externally. Instead they must be implemented directly in the format parsers. For example the method selectTokenAttribute of the Tavor format parser has to be extended. Since token attributes embed a new token, which is for example executed by fuzzing and reduce operations, a suitable token type, which can be connected to the original token, must be used. The Reset token attribute of the Sequence typed token uses for example a token of the SequenceResetItem type.

Note: The mentioned implementation inconveniences will be addressed in future versions of Tavor.

Typed tokens

Typed tokens provide additional types for formats. It is possible to define new typed tokens by calling the token.RegisterTyped function. It is only necessary to implement the Token interface, since typed tokens behave like regular tokens. Arguments for the typed tokens are used as initialization values for the instanced token. It is therefore not possible to lookup argument values after the typed token definition is processed.

An example of typed token creation function can be found in the sequence token. Currently, the arguments of a typed token are limited to integers. To support new argument types, it is necessary to extend the ArgumentsTypedParser interface and its implementation. To add token attributes to typed tokens, please have a look at the token attributes section.

How stable is Tavor?

There are some bugs and a lot of functionality is still missing as well as many features. However, basic functionality and features are stable enough and are successfully used in production by many projects.

Individual package code coverage is currently low but since most tests do cover a lot of Tavor's components this is not a big issue. However, 100% coverage using manually written tests is a required feature of the 1.0 release as well as fully fuzzing the Tavor format and the Tavor binary. This means that Tavor will be equipped to test every feature of the binary and the Tavor format itself.

Since Tavor is still a moving target, backwards-incompatible changes will happen but are documented for every release. This is necessary to make working with Tavor as easy as possible while still providing loads of functionality.

Missing features

There are also a lot of smaller features and enhancements waiting in the issue tracker.

Can I make feature requests and report bugs and problems?

Sure, just submit an issue via the project tracker and I will see what I can do. Please note that I do not guarantee to implement anything soon and bugs and problems are more important to me than new features. If you need something implemented or fixed right away you can contact me via mail mz@nethead.at to do contract work for you.