schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
323 stars 35 forks source link

Help us in upgrading syri

Introduction:

Syri compares alignments between two chromosome-level assemblies and identifies synteny and structural rearrangements.

Example Figure generated using plotsr

Pre-requisite:

  1. Python >=3.8 and the following packages: Cython-0.29.23, numpy-1.21.2, scipy-1.6.2, pandas-1.2.4, python-igraph-0.9.1, psutil-5.8.0, pysam-0.16.0.1, and matplotlib-3.3.4
  2. C/C++ compiler: g++

Recent major updates:

(20-06-2022)

(10-05-2022)

Installation:

Easiest way to install SyRI is through anaconda:

# Create a new environment and install syri with all dependencies
conda create -n syri_env -c bioconda syri
# Activate the environment
conda activate syri_env

Running:

After installation, SyRI would be in your path and can be run directly from command line. Test the installation using:

syri -h

Detailed information is available at: https://schneebergerlab.github.io/syri

Citation:

Please cite:

Goel, M., Sun, H., Jiao, W. et al. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 20, 277 (2019) doi:10.1186/s13059-019-1911-0

Current Limitations:

  1. The homologous chromosomes in the two genomes need to represent the same strand. If the chromosomes are from different strands, then the alignments between these chromosomes would be inverted. As SyRI only checks directed alignments for syntenic region identification, it would not be able to find syntenic regions and can crash. Current solution to this problem is to manually check alignments. If the majority of alignments between homologous chromosomes are inverted, then the chromosome in the query genome needs to be reverse-complemented. Then the corrected query genome needs to be aligned against the reference genome. We are working on a method which can generate dot plots to automatically identify and reverse-complement such inverted-chromosomes.
  2. Large translocations and duplications (consisting of multiple alignments) can result in high memory-usage and CPU runtime.

Older Updates:

(10-03-2022)

conda install -c bioconda syri

(17-01-2022)

git clone --single-branch --branch V1.4.1 https://github.com/schneebergerlab/syri.git
conda install -c bioconda plotsr

(12-10-2021)

(13-06-2021)

(14-05-2020)