Overview of the MERLIN (Mitocondrial EvolutionaRy Lineage INference) algorithm
paper: https://academic.oup.com/bioinformatics/article/40/Supplement_1/i218/7700844
The input for MERLIN are CSV files containing the total read counts, and the variant read counts. Both matrices should have mutations as the rows and cells as columns.
It is important that the format matches the example input files total_matrix.csv
and variant_matrix.csv
given in data/example
, which can be generated by the following command.
mkdir data/example/
python src/simulation.py -n 50 -m 5 -g 5 -c 50 -o data/example/
usage: simulation.py -m n_mutation -n n_cells -g n_clones -c coverage [-t threshold] -o O
optional arguments:
-m, --help show this help message and exit
-n, --total csv file with total read count matrix
-g, --variant csv file with variant read count matrix
-c, --coverage expected sequencing coverage for simulated data
-t, --threshold minimum variant allele frequency (default 0.05)
-o, --out output directory
variant matrix.txt
/ total_matrix.txt
: input to MERLINtree.txt
: groundtruth clone treecell_tree.txt
: groundtruth cell lineage treecell_to_clone_mapping.txt
mutation_to_clone_mapping.txt
usage: merlin.py [-h] [-t T] [-v V] -o O
optional arguments:
-h, --help show this help message and exit
-t, --total csv file with total read count matrix
-v, --variant csv file with variant read count matrix
-o, --out output prefix
An example of usage is as follows.
$ python src/merlin.py -t data/example/total_matrix.csv -v data/example/variant_matrix.csv -o data/example/
MERLIN produces the below files as output:
{output_prefix}_clone_tree_edge_list.txt
{output_prefix}_Umatrix.csv
{output_prefix}_Amatrix.csv
{output_prefix}_ancestry_edge_list.txt
An example output for the example input above can be found in data/example
We recommend using the following pipeline described in MQuad to select informative mitochondrial variants. Note that MERLIN has a reasonable run time (< 3 hours) for $m\leq 30$ mutations. In certain cases, users may need to perform additional clustering / filtering on the mitochondrial SNPs.