readcomb
- fast detection of recombinant reads in BAMsreadcomb
is a collection of command line and Python tools for fast detection
of recombination events in pooled high-throughput sequencing data. readcomb
searches for changes in parental haplotype phase across individual reads and classifies
recombination events based on various properties of the observed recombinant haplotypes.
readcomb
was designed for use with the model alga Chlamydomonas reinhardtii and
currently only supports haploids. Although the means of specifically detecting gene
conversion are more specific to C. reinhardtii, everything else in readcomb
is
generalizable to the detection of recombination events in any haploid species.
pip install readcomb
Command line preprocessing script for BAM files. bamprep
will prepare an
index file, filter out unusuable reads, and output a BAM sorted by read name.
readcomb
requires BAMs sorted by read name for fast parsing and filtering.
readcomb-bamprep --bam [bam_filepath] --out [outdir]
Optional parameters:
--samtools
- Path to samtools binary--threads [int]
- Number of threads samtools should use (default 1)--index_csi
- Create CSI index instead of BAI--no_progress
- Disable index creation - this will speed up bamprep
but
will mean no progress bars when filteringCommand line preprocessing script for VCF files
readcomb-vcfprep --vcf [vcf_filepath] --out [output_filepath]
Optional arguments
--snps_only
- Keep only SNPs--indels_only
- Keep only indels--no_hets
- Remove heterozygote calls--min_GQ [int]
- Minimum genotype quality at both sites (default 30)Command line multiprocessing script for identification of bam sequences with phase changes
readcomb-filter --bam [bam_filepath] --vcf [vcf_filepath]
Optional arguments:
-p, --processes [processes]
, Number of processes available for filter (default 4)-m, --mode [phase_change|no_match]
, Filtering mode (default phase_change
)-l, --log [log_filepath]
, Filename for log metric output-o, --out [output_filepath]
, File to write filtered output to (default recomb_diagnosis
)Python module for detailed classification of sequences containing phase changes
>>> import readcomb.classification as rc
>>> from cyvcf2 import VCF
>>> bam_filepath = 'data/example_sequences.bam'
>>> vcf_filepath = 'data/example_variants.vcf.gz'
>>> pairs = rc.pairs_creation(bam_filepath, vcf_filepath) # generate list of Pair objects
>>> cyvcf_object = VCF(vcf_filepath) # cyvcf2 file object
>>> print(pairs[0])
Record name: chromosome_1-199370
Read1: chromosome_1:499417-499667
Read2: chromosome_1:499766-500016
VCF: data/example_variants.vcf.gz
>>> pairs[0].classify(cyvcf_object) # run classification algorithm
>>> print(pairs[0])
Record name: chromosome_1-199370
Read1: chromosome_1:499417-499667
Read2: chromosome_1:499766-500016
VCF: data/example_variants.vcf.gz
Unmatched Variant(s): False
Condensed: [['CC2936', 499417, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 500016]]
Call: gene_conversion
Condensed Masked: [['CC2936', 499487, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 499946]]
Call Masked: gene_conversion
GNU General Public License v3 (GPLv3+)
Currently in alpha