Closed grabearummc closed 4 years ago
Thanks for contacting us. I'd say you are good to go with the master branch, and you'll find documentation and working examples when using ?import
after installation. See also the documentation of the import
function here: https://bioconductor.org/packages/devel/bioc/manuals/EnrichmentBrowser/man/EnrichmentBrowser.pdf
I conducted a number of tests that were looking good, but you would be def among the first users, so I'd be interested in how this works for you and whether we need to extend the functionality further.
Alright, thank you that's great! I'll use this issue as a forum to discuss.
Consider adding in functionality for EnrichmentBrowser::idMap
so that it automatically validates/converts ENSEMBL ids from id.version
to id
(e.g. ENSG00000002919.14
to ENSG00000002919
). Try to conserve id.version
by adding another column to rowData. This is really more of an issue with AnnotationDBI, but it couldn't hurt.
gsub("\\..*", "", row.names(ens_table))
let's make this another issue ("ENSEMBL ID version conversion"), and let's use the current issue for focusing on functionality of EnrichmentBrowser::import
-- renaming the issue for clarity
@lgeistlinger , thank-you for the update! (esp. import).
# I used apeglm for the res:
res <- lfcShrink(dds, coef="FM_1_vs_0", type="apeglm")
se <- import(dds, res, from = "DESeq2")
Error in .importFromDESeq2(obj, res) :
all(rnames %in% colnames(res)) is not TRUE
colnames(res)
[1] "baseMean" "log2FoldChange" "lfcSE"
[4] "pvalue" "padj"
# repeated without "apeglm"
res <- results(dds)
se <- import(dds, res, from = "DESeq2")
Error in .importFromDESeq2(obj, res) :
Supported experimental designs include binary group comparisons
with an optional blocking variable for paired samples / sample batches
design(dds)
~ age + batch + condition
I have 4 questions for you:
Is LFC shrinkage needed / recommended with EnrichmentBrowser?
I have to use age (scaled, continuous) and batch (binary) as covariates to analyze for my condition of interest (binary)? How can I use this information with EnrichmentBrowser to avoid any false positives?
To circumvent covariate problem, I was thinking of using ranked gene list with pi scores res$pi <- res$log2FoldChange*(-log(res$pvalue))
, can I do this for downstream topology-based methods in EnrichmentBrowser, if yes, how?
Could you suggest a better way to rank / score genes? I am not sure how should the tie broken in case of non-unique scores.
Thanks for testing + raising these issues @vivek-verma202! I am currently sprinting towards a deadline for a grant proposal on Sunday, but I am looking into this + will get back to you on Monday.
Thanks, good-luck with your grant application!
Alright here is an update for you @lgeistlinger.
I've been using import and I have a few suggestions. For importing limma data I think there needs to be a way for using data from the limma-trend approach. Check page 72 on the limma user guide.
Instead of using voom: https://github.com/lgeistlinger/EnrichmentBrowser/blob/4357b8004cdcd70093ec69b5b1448bd019fad77c/R/import.R#L120-L132
You would use the limma-trend method:
#' # (5) import from trend/limma (RNA-seq count data)
#' # (5a) create the expression data object
#' library(limma)
#' keep <- filterByExpr(counts, rdesign)
#'
> logCPM <- edgeR::cpm(counts[keep,], log=TRUE, prior.count=3)
#' # (5b) obtain differential expression results
> fit <- lmFit(logCPM, design)
> fit <- eBayes(fit, trend=TRUE)
> res <- topTable(fit, coef=ncol(design))
#'
#' # (5c) import???
#' se <- import(el, res)
Hi @lgeistlinger, hope your grant application went well.
I was wondering if you had a chance to look into the "limitations" of current import()
?
As far as my question 1 is concerned, some reading made me realize that LFC shrinkage (apeglm) is not necessary esp. if I have already filtered out poorly expressed genes. Hence, was planning to ditch it (unless you've something to add to it).
Hope to hear from you, soon.
Hi @vivek-verma202,
Thanks for your patience.
The application has been a sprint involving daily 12 h shifts for the last 7 days straight - but I am pretty happy with the product. Thanks for asking.
Concerning your questions:
lfcShrinkage:
According to the DESeq2 vignette only relevant for visualization and ranking of genes when having not filtered on not sufficiently expressed genes first.
In the EnrichmentBrowser, I typically use edgeR::filterByExpr (but you can easily apply customized thresholds as well) for that as the very first step of processing of the count matrix, ie before normalization, DE analysis, and GS analysis.
Thus: if you've filtered before - yes, shrinkage is not needed.
That said, I think the error that you encounter in your first post is mainly a
result of your result object (res
) not having the same number of rows (genes) as your dds
. I suspect rows/genes been removed by lfcShrink
?
covariates:
design ~ batch + condition
should work right out of the box (can you confirm?).
Supported experimental designs currently only include binary group comparisons with an optional blocking variable for paired samples / sample batches.
Adding additional covariates is currently not supported and I would take this as a feature request and will invest some time to implement that.
Note, however that only regression GS methods such as camera
and roast
support extended experimental designs and are able to make use of this information.
Something that you could do right away:
design ~ age + condition
if you would form here a number of discrete age groups.
(inspecting resulting DE measures against design ~ condition
only would btw
give you an indication whether there are indeed age-specific effects).
If you are interested in those, or in general methods that only work on the DE
results, such as eg also ora
, then you can actually ignore my remarks on
experimental design above and
i) make use of the sig.stat
argument that can be supplied to EnrichmentBrowser::sbea
and EnrichmentBrowser::nbea
.
• beta: Log2 fold change significance level. Defaults to 1 (2-fold).
• sig.stat: decides which statistic is used for determining significant DE genes. Options are:
• 'p' (Default): genes with adjusted p-value below
alpha.
• 'fc': genes with abs(log2(fold change)) above beta
• '&': p & fc (logical AND)
• '|': p | fc (logical OR)
• 'xxp': top xx % of genes sorted by adjusted p-value
• 'xxfc' top xx % of genes sorted by absolute log2 fold
change.
Example: if you wanted to conduct a SPIA analysis with genes rendered differentially expressed (DE genes) that have an absolute log2 fold change > 1 and an adjusted p-value < 0.1, you would call:
spia.res <- nbea("spia", se, gs, grn, alpha = 0.1, beta = 1, sig.stat = '&')
ii) If not otherwise specified (eg via the sig.stat
argument), the default point
of reference for the EnrichmentBrowser functions when it comes to gene DE measures is the ADJ.PVAL
column in the rowData
slot of your SummarizedExperiment (ie. your dds
after using import
).
se <- import(dds, res)
You can always override / abuse this column by eg setting it
rowData(se)[my.DE.genes.of.interest, "ADJ.PVAL"] <- 0.04
rowData(se)[notDE.notInteresting.genes, "ADJ.PVAL"] <- 0.06
as per default, sbea and nbea methods that work only on a list of DE genes
will take genes with ADJ.PVAL < alpha
(default alpha=0.05
).
Note the: xxp
or xxfc
options of the sig.stat
argument above are
an alternative to that, allowing to select a certain percentage of genes
that you find interesting based on log2 fold change or p-value (and can
also be accordingly tweaked as above).
When it comes to ranking, the raw p-value typically has better granularity than the adjusted p-value. Accordingly:
res <- DESeq2::results(dds)
sorting.df <- res[,c("pvalue","log2FoldChange")]
sorting.df[,2] <- -abs(sorting.df[,2])
ind <- do.call(order, as.data.frame(sorting.df)))
would give you a very fine-grained order by nominal p-value, and for genes with equal nominal p-value it would sort by absolute fold change in decreasing order. You could eg use that ordering after importing via:
se <- se[ind,]
(if that still doesn't result in a satisfying ranking, you could incorporate additional measures such as the testing statistic by modifying the above line to:
sorting.df <- res[,c("pvalue","log2FoldChange", "stat")]
Hope that helps and don't hesitate to inquire further if some things are unclear.
Thank you so much, @lgeistlinger ! Here are the updates:
design = ~ batch + condition
worked!
(Results were not very different with or without age, thanks to you, age is no more covariate of interest in my analysis.)
I could not proceed without normalization:
res <- sbea(method = "gsea", se = se, gs = go.bp.gs, browse = T)
Error in .reorderAssays(se, assay) :
Expression dataset (se) does not contain an assay named "norm"
Normalization worked but excluded ~91 % of genes:
se <- normalize(se, norm.method = "vst")
Excluding 8118 genes not satisfying min.cpm threshold
The results were from DESeq2 and not from raw read counts. Considering the vignette, I am not sure if normalization is needed. Also, DESeq2 output already has normalization factors:
How can I make use of DESeq2's normalization factors to normalize the count slot and create "norm" slot in an SE?
(I think, it's an important consideration for the import()
function.)
I came with a work around, not sure if it's correct, please let me know opinion:
se@assays@data@listData[["norm"]] <- se@assays@data@listData[["counts"]]/se@assays@data@listData[["normalizationFactors"]]
Looks that counts were normalized:
hist(log(se@assays@data@listData[["counts"]]))
hist(log(se@assays@data@listData[["counts"]]))
Hi @vivek-verma202 -
se <- normalize(se, norm.method = "vst")
is the right thing todo here. I'll add an argument to normalize
to switch off filtering.
There are some helpful accessor functions such as assay
, colData
, and rowData
for a SummarizedExperiment
.
See also the graphical overview of a SummarizedExperiment
and how to access its main parts here.
A fast approximation of what EnrichmentBrowser::normalize()
with method = "vst"
does:
assay(se, "norm") <- edgeR::cpm(assay(se, "counts"))
But I agree, that's something that import
should support. I'll make a dedicted issue.
Closing this and will continue working on the specific issues derived from the discussion here (#28 and #29). Feel free to reopen + thanks a lot for testing and feedback.
Hi guys. I'm starting to work with
EnrichmentBrowser
, and I'm running into some issues. I'm looking at porting in DGE data from different sources (DESeq2
,limma
, andedgeR
). My biggest problem is getting these objects into the proper format (SummarizedExperiment
).After exploring this repo some more, I noticed your changelog has a bullet for new import functions for deseq/limma/edgeR. This would be super helpful, so I don't have to write my own functions. So, how stable are they right now? Is the master branch "ok" to use in its current form?