mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Remove 'R' Dependency ( and therefore DESeq ) #1

Closed molikd closed 1 year ago

molikd commented 9 years ago

This is a feature request for removing the DESeq R dependency in TEToolkit, this essentially means having to write the DESeq functions of newCountDataSet (not essential), estimateSizeFactors, estimateDispersions and nbinomTest.

There are four places where R is called from TEToolkit:

molikd commented 7 years ago

Looked at:

https://bioconductor.org/packages/release/bioc/manuals/DESeq/man/DESeq.pdf

From Page 21: newCountDataSet—This function creates a CountDataSet object from a matrix or data frame of count data On Bioconductor-mirror

From Page 9: estimateSizeFactors—Estimate the size factors for a CountDataSet On Bioconductor-mirror

From Page 6: estimateDispersions—This function obtains dispersion estimates for a count data set. For each condition (or collectively for all conditions, see ’method’ argument below) it first computes for each gene an empirical dispersion value (a.k.a. a raw SCV value), then fits by regression a dispersion-mean relationship and finally chooses for each gene a dispersion parameter that will be used in subsequent tests from the empirical and the fitted value according to the ’sharingMode’ argument. On Bioconductor-mirror

From Page 18: nbinomTest—This function tests for differences between the base means of two conditions (i.e., for differential expression in the case of RNA-Seq). On Bioconductor-mirror

molikd commented 7 years ago

for estimateDispersions we use the blind, per-condition, and pooled methods, however we don't use pooled-CR. The R code generates TEtranscripts_out_gene_TE_analysis.txt and TEtranscripts_out_sigdiff_gene_TE.txt files for TETranscripts. It looks like TEPeaks is similar. The the R code, basically, is running some Stats/Diff/Normalization, then its outputting into a file. It looks like then we might be best served by two sets of functions for this: replacements for estimateSizeFactors, estimateDispersions, and nbinomTest; and replacements for outputting the data into the graphs/tables its creating.

molikd commented 7 years ago

I'm going to write this on a fork: https://github.com/status-five/tetoolkit

emattei commented 3 years ago

I totally support this, I am just interested in getting raw counts out of the tool and it would be amazing not have to deal with R and DESeq dependencies.

olivertam commented 3 years ago

Hi.

If you only want the raw counts, you can use TEcount in the TEtranscripts package. This performs only the quantification, and thus does not require R or DESeq2.

Thanks.

emattei commented 3 years ago

Oh I see, I completely missed the existence of TEcount tool. Sorry about that. TEcount is exactly what I was looking for!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days