kfuku52 / csubst

Molecular convergence detection
BSD 3-Clause "New" or "Revised" License
25 stars 1 forks source link

Overview

CSUBST (/si:sʌbst/) is a tool for analyzing Combinatorial SUBSTitutions of codon sequences in phylogenetic trees. A combinatorial substitution is defined as recurrent substitutions that occur at the same protein site in multiple independent branches. If multiple substitutions result in the same amino acid, they are considered convergent amino acid substitutions. The main features of CSUBST include:

Input files

CSUBST takes as inputs:

Installation and test run

CSUBST runs on python 3 (tested with >=3.6.0). For a quick installation and test run, try:

# IQ-TREE installation with conda
conda install iqtree

# Installation with pip
pip install numpy cython # NumPy and Cython should be available upon csubst installation
pip install git+https://github.com/kfuku52/csubst

# Generate a test dataset
csubst dataset --name PGK

# Run csubst analyze
csubst analyze \
--alignment_file alignment.fa \
--rooted_tree_file tree.nwk \
--foreground foreground.txt

Basic usage

CSUBST is composed of several subcommands. csubst -h shows the list of subcommands, and the complete set of subcommand options are available from csubst SUBCOMMAND -h (e.g., csubst analyze -h). Many options are available, but those used by a typical user would be as follows. More advanced usage is available in CSUBST wiki.

Citation

Fukushima K, Pollock DD. 2023. Detecting macroevolutionary genotype-phenotype associations using error-corrected rates of protein convergence. Nature Ecology & Evolution 7: 155–170. DOI: 10.1038/s41559-022-01932-7

Licensing

CSUBST is BSD-licensed (3 clause). See LICENSE for details.