raphael-group / ConDoR

MIT License
8 stars 1 forks source link

ConDoR (constrained Dollo Reconstruction)

Overview of ConDoR Overview of the ConDoR algorithm. ConDoR takes as input: (a) A clustering of cells based on copy-number profiles and (b) variant and total read counts from scDNA-seq data. ConDoR employs the Constrained k-Dollo model to construct the (c) constrained k-Dollo phylogeny and the (d) mutation matrix.

More details about the problem formulation and the algorithm can be found here: https://www.biorxiv.org/content/10.1101/2023.01.05.522408v1.abstract

Contents

  1. Pre-requisites
  2. Usage instcructions

Pre-requisites

Usage instructions

I/O formats

The input for ConDoR are CSV files containing the total read counts, the variant read counts and the clustering (with or without the mutation matrix). It is important that the format matches the examples given in data/sample that were generated using the following commands.

mkdir data/sample
python src/simulation_reads.py -n 25 -m 25 -p 5 -k 1 -s 0 -d 0.1 -a 0.001 -b 0.001 -o data/sample/overview

ConDoR

usage: condor.py [-h] [-i I] [-r R] [-v V] [-s S] [-a A] [-b B] [--ado ADO] [-k K] -o O [-t T]

optional arguments:
  -h, --help  show this help message and exit
  -i I        csv file with mutation matrix and cluster id
  -r R        csv file with total read count matrix
  -v V        csv file with variant read count matrix
  -s S        file containing list of SNPs
  -a A        false positive error rate [0.001]
  -b B        false negative error rate [0.001]
  --ado ADO   precision parameter for ADO
  -k K        maximum number of losses for an SNV
  -o O        output prefix
  -t T        time limit in seconds [1800]

An example of usage is as follows.

$ python src/condor.py -i  data/sample/overview_character_matrix.csv -a 0.0018 -b 0.001 -k 1 -r data/sample/overview_read_count.csv -v data/sample/overview_variant_count.csv -o data/sample/overview