Add `pyani evolve` command.

widdowquinn commented 4 years ago

Summary:

We need a way to generate benchmark test data to ensure consistency and accuracy of pyani output.

Description:

The initial plan is to take a single genome sequence as input (this may be random...) and an accompanying network representing the input sequence's evolution. Each edge describes a process happening to an input genome, and can be any of several optional processes (with appropriate parameterisation):

random substitution
inversion
gain/loss of sequence from outside the network
HGT within the network

Starting from the input genome, these processes are applied as intended in the graph.

This will generate a set of input genomes for testing pyani where we know the evolutionary history of every "leaf node" sequence, and can interpret output accordingly. The data can then be used to benchmark ANI, k-mer and other genome analyses.

pyani Version:

Planned for v0.3+

baileythegreen commented 3 years ago

Potentially related tools:

https://github.com/soumyakundu/SaGePhy

https://github.com/xavierdidelot/ClonalFrameML

https://pubmed.ncbi.nlm.nih.gov/27713837/

https://academic.oup.com/bioinformatics/article/34/13/2308/4883490

baileythegreen commented 2 years ago

Work on this has been started on the evolve branch.

widdowquinn / pyani