We need a way to generate benchmark test data to ensure consistency and accuracy of pyani output.
Description:
The initial plan is to take a single genome sequence as input (this may be random...) and an accompanying network representing the input sequence's evolution. Each edge describes a process happening to an input genome, and can be any of several optional processes (with appropriate parameterisation):
random substitution
inversion
gain/loss of sequence from outside the network
HGT within the network
Starting from the input genome, these processes are applied as intended in the graph.
This will generate a set of input genomes for testing pyani where we know the evolutionary history of every "leaf node" sequence, and can interpret output accordingly. The data can then be used to benchmark ANI, k-mer and other genome analyses.
Summary:
We need a way to generate benchmark test data to ensure consistency and accuracy of
pyani
output.Description:
The initial plan is to take a single genome sequence as input (this may be random...) and an accompanying network representing the input sequence's evolution. Each edge describes a process happening to an input genome, and can be any of several optional processes (with appropriate parameterisation):
Starting from the input genome, these processes are applied as intended in the graph.
This will generate a set of input genomes for testing
pyani
where we know the evolutionary history of every "leaf node" sequence, and can interpret output accordingly. The data can then be used to benchmark ANI, k-mer and other genome analyses.pyani Version:
Planned for v0.3+