ness-lab / readcomb

Fast detection of recombinant sequences from BAM files
GNU General Public License v3.0
4 stars 1 forks source link

readcomb - fast detection of recombinant reads in BAMs

PyPI version

readcomb is a collection of command line and Python tools for fast detection of recombination events in pooled high-throughput sequencing data. readcomb searches for changes in parental haplotype phase across individual reads and classifies recombination events based on various properties of the observed recombinant haplotypes.

readcomb was designed for use with the model alga Chlamydomonas reinhardtii and currently only supports haploids. Although the means of specifically detecting gene conversion are more specific to C. reinhardtii, everything else in readcomb is generalizable to the detection of recombination events in any haploid species.

Installation

pip install readcomb

Dependencies

Usage:

bamprep

Command line preprocessing script for BAM files. bamprep will prepare an index file, filter out unusuable reads, and output a BAM sorted by read name. readcomb requires BAMs sorted by read name for fast parsing and filtering.

readcomb-bamprep --bam [bam_filepath] --out [outdir]

Optional parameters:

vcfprep

Command line preprocessing script for VCF files

readcomb-vcfprep --vcf [vcf_filepath] --out [output_filepath]

Optional arguments

filter

Command line multiprocessing script for identification of bam sequences with phase changes

readcomb-filter --bam [bam_filepath] --vcf [vcf_filepath]

Optional arguments:

classification

Python module for detailed classification of sequences containing phase changes

>>> import readcomb.classification as rc
>>> from cyvcf2 import VCF

>>> bam_filepath = 'data/example_sequences.bam'
>>> vcf_filepath = 'data/example_variants.vcf.gz'
>>> pairs = rc.pairs_creation(bam_filepath, vcf_filepath)     # generate list of Pair objects
>>> cyvcf_object = VCF(vcf_filepath)                          # cyvcf2 file object

>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz

>>> pairs[0].classify(cyvcf_object)                           # run classification algorithm
>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz
Unmatched Variant(s): False 
Condensed: [['CC2936', 499417, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 500016]] 
Call: gene_conversion 
Condensed Masked: [['CC2936', 499487, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 499946]] 
Call Masked: gene_conversion 

License

GNU General Public License v3 (GPLv3+)

Development

Currently in alpha

Source code

Development repo