smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
316 stars 31 forks source link

Enrichr filtering & background #193

Open DelongZHOU opened 5 months ago

DelongZHOU commented 5 months ago

Hi Sam, It seems that the Enrichr table is only filtered based on the enrich score. Would it be possible to filter by the adjusted p-val first? For example only terms with adj. p<0.05 (or custom threshold) should be included. Also would it be possible to provide a background for the enrichment as an appropiate background is very important for this type of analysis. One possible background could be all the genes detected in the cell type.

I found this on the EnrichR git but they must have modified the API that the code provided below is no longer functional. https://github.com/MaayanLab/enrichr_issues/issues/11 (Edit: missing bit in the link)

Thanks!

(BTW I appreciate how quickly you updated the package for my previous requests. Keep on keeping on!)

smorabit commented 5 months ago

Hi

It seems that the Enrichr table is only filtered based on the enrich score. Would it be possible to filter by the adjusted p-val first? For example only terms with adj. p<0.05 (or custom threshold) should be included.

Can you please clarify what you are referring to? The function GetEnrichrTable returns the unfiltered results.

Also would it be possible to provide a background for the enrichment as an appropiate background is very important for this type of analysis. One possible background could be all the genes detected in the cell type.

This is a good point. Unfortunately, I looked into it and it seems that the enrichR R package does not have this as an option (see wjawaid/enrichR#68). I could potentially use a different package aside from enrichR, I will look into it as an option.

DelongZHOU commented 5 months ago

Hi,

Sorry I meant EnrichrBarPlot and EnrichrDotPlot, since both pick the top term(s) for plotting.

For the enrichment with background indeed we seem to fall onto the same issues. I was trying to figure out how to automate that process but after a few attempts I gave up.

DelongZHOU commented 2 months ago

Hi, I found some packages from this tutorial that provides GeneOntology, Reactome, KEGG and other enrichment analysis. Each enrichment analysis has its own package and the syntax is pretty much consistant and clear.

DelongZHOU commented 1 month ago

Just realized that I forgot the link: https://yulab-smu.top/biomedical-knowledge-mining-book/index.html

And my code to use the packages for Reactome and Gene Ontology are as the following:

library(tidyverse)
library(DOSE)
library(clusterProfiler)
library(ReactomePA)
#load this package for string manipulation
library(strex) 

#GO flavor
flavors=c('BP','CC','MF')

#load module genes, here I've manually converted the gene names to Entrez ID and added as a new column "Entrez"
modules=read.csv('')
module_names=unique(modules$module)

#load background genes, manually extracted from Seurat and converted to Entrez ID as well
#each column is a list of genes expressed in the cell type
#the cell type is the column name, which has underscore and dash replaced by dot
bgs=read.csv()

for (module_name in module_names) {
print(module_name)
target=modules[modules$module==module_name,]$Entrez
target=as.character(target)
#extract cell type info from module name
#in my case my modules are named as $celltype, space, letter N, $number
#I use str_before_last from strex to split by the letter N, since some of my cell types contain N in it
#then replace space and dash in the cell type to dot
#the background file already contain these modifications
cell_type=gsub('[ -]','.',str_before_last(module_name, " N"))
print(cell_type)
bg=bgs[[cell_type]]
bg=as.character(bg)

#reactome
print('Reactome')
reactome <- enrichPathway(
      target,
      organism = "mouse",
      pvalueCutoff = 0.05,
      pAdjustMethod = "BH",
      qvalueCutoff = 0.05,
      universe = bg,
      minGSSize = 10,
      maxGSSize = 500,
      readable = FALSE
)
write.csv(reactome,paste0())

#GO
for (flavor in flavors) {
  print(flavor)
  ego<-enrichGO(
    target,
    universe=bg,
    OrgDb='org.Mm.eg.db',
    keyType = "ENTREZID",
    ont = flavor,
    pvalueCutoff = 0.05,
    pAdjustMethod = "BH",
    qvalueCutoff = 0.05,
  )
  write.csv(ego,paste0())
}
}

A few notes: Underscores and dashes are not allowed in the column names so I had to replace them with dots. The species appropirate database should be downloaded previous to execute the code For the Gene Ontology package official gene symbols can be used instead of Entrez ID (with keyType="SYMBOL"), but such option is not supported for Reactome.