Open dktanwar opened 3 years ago
seqpac
uses BioStrings
or cutadapt
for trimmingshortRNA
uses fastp
for trimming
fastp
is 3 times faster than cutadapt: https://doi.org/10.1093/bioinformatics/bty560
seqpac
uses RBowtie
for alignmentFrom
seqPac
paper: A likely reason for Bowtie’s popularity in sRNA community is because it is reliable with short sequence alignments. For instance, we initially tried to integrate the Rsubreads package (61) in seqpac’s workflow, which applies a highly efficient ‘seed-and-vote’ mapping algorithm. However, for certain read lengths we consistently experienced failure to correctly vote for the best alignment, possibly as a consequence that too few seeds were covering the read. We will off-course explore more efficient alternatives to Bowtie in the future.
shortRNA
uses Rsubread
for alignmentFrom
Rsubread
paper (https://doi.org/10.1093/nar/gkz114): "QuasR is however an interface to C programs from 2010 or earlier, specifically to Bowtie version 1.1.1 (18), SpliceMap 3.3.5.2 (19) and SeqAn 1.1 (20). These older tools do not reflect the considerable improvements in algorithms achieved during the last 8 years."This raises a flag to re-investigate the choice of alignment for us. However, we can have option of both aligners:
Rbowtie
andRsubread
.
seqPac
doesn't seems to be performing any QCshortRNA
does QC and provide a number of interactive plots: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 25 to 36)seqPac
uses general annotations from databasesshortRNA
uses annotations from databases and further:
seqPac
assign the multi-mapping reads on the basis of hierarchy: rRNA > tRNA > miRNA > snoRNA > snRNA > lnc/lincRNA > piRNA. This means that the multi-mapping reads are first assigned to rRNA and at the last to piRNAs with the allowed mismatches. The author says: "For better transparency and reproducibility of sRNA experiments, we recommend that analysis is performed on a class-by-class basis as far as possible."
sortRNA
have certain rules for reads assignment, which are also customisable: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 42 to 48)
In principle,
shortRNA
has advantage here for the reads assignment.
seqPac
depends on an external tools such as trimmomatic
when the user wants to use it for trimming.seqPac
mostly uses the frameworks such as lists
and tibble
. These are easy to use.shortRNA
utilizes DFrame
, FactorList
, TreeSummarizedExperiment
frameworks, which are fast and have several advantages. These have a steep learning curve.seqPac
seems to be one of the best tools out now for sRNA-seq data analysis but from the above comparisons we could say that shortRNA
is still better in terms of speed, features offered and the interactive and informative plots one would be able to make using the shortRNA
.
Please also check: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 57 to 59)
There are clearly important similarities: they use sequence-based counting like we do, and offer some end-user functionalities we aimed at (e.g. the coverage plots), but beside that the package doesn't go much beyond what was already out there, and there's nothing anywhere near the most critical features of our approach, i.e. the tree-based assignment and hypothesis testing... it feels to me like a good example to use to ensure we've got everything that's offered elsewhere.
A number of tools here: https://tools4mirs.org/
Comparison for tRNAs: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04691-1
Seqpac
MirMaster 2.0