Cerberus

Cerberus is a set of tools designed to characterize and enhance transcriptome annotations. Currently Cerberus can do the following:

Represent transcript start sites (TSSs) and transcript end sites (TESs) as bed regions rather than single base pair ends
Integrate intron chains from multiple transcriptome annotations (GTFs) to create a transcriptome of the union of them all
Integrate TSSs and TESs from multiple GTFs as well as from outside BED sources to create end annotations from the union of them all
Number intron chains, TSSs, and TESs found by their priority in a reference GTF
Use the enhanced intron chain and 5'/3' end sets to annotate an existing GTF transcriptome with transcript triplets and to modify the GTF and corresponding abundance matrices to reflect the new naming scheme / identities of the transcripts
Compute gene triplets for different sets of isoforms for each gene based on the TSSs, ICs, and TESs used among the isoforms of the gene
Generate plots (see examples below) to visualize gene triplets on the gene structure simplex
Compute centroids of gene triplet coordinate distributions
Compute pairwise gene structure simplex distances between pairs of gene triplets

Please visit the Cerberus website for documentation.

Note: Cerberus is under active development. Please feel free to open an issue or email me ( freese {at} uci.edu ) if you're interested in using it!

mortazavilab / cerberus

readme

Cerberus