dktanwar commented 3 years ago

Seqpac

Tool
Paper

MirMaster 2.0

Tool
Paper

dktanwar commented 3 years ago

Seqpac vs shortRNA

Comparison in terms of speed

Trimming

seqpac uses BioStrings or cutadapt for trimming
shortRNA uses fastp for trimming

fastp is 3 times faster than cutadapt: https://doi.org/10.1093/bioinformatics/bty560

Alignment

seqpac uses RBowtie for alignment

From seqPac paper: A likely reason for Bowtie’s popularity in sRNA community is because it is reliable with short sequence alignments. For instance, we initially tried to integrate the Rsubreads package (61) in seqpac’s workflow, which applies a highly efficient ‘seed-and-vote’ mapping algorithm. However, for certain read lengths we consistently experienced failure to correctly vote for the best alignment, possibly as a consequence that too few seeds were covering the read. We will off-course explore more efficient alternatives to Bowtie in the future.

shortRNA uses Rsubread for alignment

From Rsubread paper (https://doi.org/10.1093/nar/gkz114): "QuasR is however an interface to C programs from 2010 or earlier, specifically to Bowtie version 1.1.1 (18), SpliceMap 3.3.5.2 (19) and SeqAn 1.1 (20). These older tools do not reflect the considerable improvements in algorithms achieved during the last 8 years."

This raises a flag to re-investigate the choice of alignment for us. However, we can have option of both aligners: Rbowtie and Rsubread.

Comparison in terms of:

QC

seqPac doesn't seems to be performing any QC
shortRNA does QC and provide a number of interactive plots: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 25 to 36)

Annotation

seqPac uses general annotations from databases
shortRNA uses annotations from databases and further:
- accounts for the posttranslational modifications: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 13 and 14)
- names of the features are adapted to make the format identical across databases: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 12 and 16)
- accounts for the miRNA clusters https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 17 and 18)
- accounts for hierarchical organization of the features: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 15,18,19, and 20)

Reads assignment of multi-mapping reads

seqPac assign the multi-mapping reads on the basis of hierarchy: rRNA > tRNA > miRNA > snoRNA > snRNA > lnc/lincRNA > piRNA. This means that the multi-mapping reads are first assigned to rRNA and at the last to piRNAs with the allowed mismatches. The author says: "For better transparency and reproducibility of sRNA experiments, we recommend that analysis is performed on a class-by-class basis as far as possible."
sortRNAhave certain rules for reads assignment, which are also customisable: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 42 to 48)

In principle, shortRNA has advantage here for the reads assignment.

Framework

Both tools are developed as R packages. seqPac depends on an external tools such as trimmomatic when the user wants to use it for trimming.
seqPac mostly uses the frameworks such as lists and tibble. These are easy to use.
shortRNA utilizes DFrame, FactorList, TreeSummarizedExperiment frameworks, which are fast and have several advantages. These have a steep learning curve.

Summary

seqPac seems to be one of the best tools out now for sRNA-seq data analysis but from the above comparisons we could say that shortRNA is still better in terms of speed, features offered and the interactive and informative plots one would be able to make using the shortRNA.

Please also check: https://dktanwar.github.io/PhD/PR/20210622/20210622_PR_Deepak_Tanwar (slides 57 to 59)

plger commented 3 years ago

There are clearly important similarities: they use sequence-based counting like we do, and offer some end-user functionalities we aimed at (e.g. the coverage plots), but beside that the package doesn't go much beyond what was already out there, and there's nothing anywhere near the most critical features of our approach, i.e. the tree-based assignment and hypothesis testing... it feels to me like a good example to use to ensure we've got everything that's offered elsewhere.

dktanwar commented 2 years ago

A number of tools here: https://tools4mirs.org/

dktanwar commented 1 year ago

Comparison for tRNAs: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04691-1

shortRNAhub / shortRNA

Comparison with existing tools #39

Seqpac

MirMaster 2.0

Seqpac vs shortRNA

Comparison in terms of speed

Trimming

Alignment

Comparison in terms of:

QC

Annotation

Reads assignment of multi-mapping reads

Framework

`shortRNA` utilizes `DFrame`, `FactorList`, `TreeSummarizedExperiment` frameworks, which are fast and have several advantages. These have a steep learning curve.

Summary

shortRNAhub / shortRNA

Comparison with existing tools #39

Seqpac

MirMaster 2.0

Seqpac vs shortRNA

Comparison in terms of speed

Trimming

Alignment

Comparison in terms of:

QC

Annotation

Reads assignment of multi-mapping reads

Framework

shortRNA utilizes DFrame, FactorList, TreeSummarizedExperiment frameworks, which are fast and have several advantages. These have a steep learning curve.

Summary

`shortRNA` utilizes `DFrame`, `FactorList`, `TreeSummarizedExperiment` frameworks, which are fast and have several advantages. These have a steep learning curve.