wencke / wencke.github.io

GOplot
18 stars 7 forks source link

circle_dat() output NAs for logFC and z-score. #18

Open limin321 opened 1 year ago

limin321 commented 1 year ago

Hey, I tried to create a GOplot with the following codes

DEGs <- read.csv("./DEG.csv")
## Step 1: get entrezgene_id
gene_id <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "entrezgene_id"), mart = useDataset("mmusculus_gene_ensembl", useMart("ensembl")))

# merge 2 dataframe based by giving the columns of the 2 dataframe for merging.
# Sort log2FC from higher to lower for GSEA
DEs <- merge(DEGs, gene_id, by.x = "SYMBOL", by.y = "external_gene_name")
DEs <- DEs[order(DEs$avg_log2FC, decreasing = TRUE),]
DEsub <- DEs
kk <- enrichGO(gene = DEsub$entrezgene_id,
               OrgDb = "org.Mm.eg.db",
               pvalueCutoff = 0.05,
               qvalueCutoff = 0.05,
               ont = "all",
               readable = T)
terms <- kk@result %>% dplyr::select("ONTOLOGY","ID","Description","p.adjust", "geneID","Count")
colnames(terms) <- c("Category", "ID", "Term", "adj_pval","Genes", "Count")
genes <- DEsub[c("SYMBOL","avg_log2FC")]
colnames(genes) <- c("ID", "logFC")
circ <- circle_dat(terms = terms, genes = genes)

The above script runs with no Error, but the output is NOT correct, with both logFC and z-score are all NAs.

Here is the output screenshot:

Screen Shot 2023-06-11 at 5 34 26 PM

Here is the input terms:

Screen Shot 2023-06-11 at 5 35 08 PM

Here is the second input genes:

Screen Shot 2023-06-11 at 5 34 48 PM

The the output only gave me NAs?? Any suggestions will be greatly appreciated.

Best, LC

Nhien198 commented 1 year ago

Thank you very much for your email!I am not online now, I will reply you as soon as possible.Best wish, 

paolabc commented 1 year ago

Hi There!

I also have the same issue. Did you solve it?

Thank you in advance for your answer! :)

Best, Paola

paolabc commented 1 year ago

Hi!

I just figured out the issue. You just need to replace the gene separator from slash ( / ) to comma ( , ) on your enrichment analysis. Since the enrichment based on DAVID database returns the genes separated by comma, the separator on the output from other databases do not work.

Please, try: terms$geneID<- gsub("/",",",terms$geneID)

Best, Paola

daniellembecker commented 1 year ago

I am also having this issue with NAs for the logFC and zscores, I think I figured out that my GFF3 file had extra genes that were not significant so they were filtered out so I used na.omit for the logFC not listed, but the zscores still seem to be not calculating correctly. Please help!

Screen Shot 2023-07-06 at 3 30 27 PM
daniellembecker commented 1 year ago

Here is my table before I remove the NA's for the logFC

Screen Shot 2023-07-06 at 3 37 55 PM
paolabc commented 1 year ago

Hi

I do not know much about your table output, but It seems that zscore for just one term is being calculate. It is suprised me that you get the same score for different GO terms. Can you provide your R code?

daniellembecker commented 1 year ago

Thank you @paolabc for being willing to help, I know I think something weird is going on. Here is my code and my data files.

daniellembecker commented 1 year ago

DEG exploratory plots with GOplot

following vignette: https://cran.r-project.org/web/packages/GOplot/vignettes/GOplot_vignette.html

library(GOplot)
library(dplyr)

dat <- read.csv("RAnalysis/Data/RNA-seq/P.verrucosa/Host/DEG_GO_parent_terms_complete.csv")

dat$adj_pval <- p.adjust(dat$over_represented_pvalue, method="BH")

# Get a glimpse of the data format of the results of the functional analysis... 
head(dat)

#need to keep only rows for Category = ontology, ID = goid Term=term, Genes =geneID and adj_pvalue, keep columns 3, 8,9,10, 12
dat1 <- dat[c(9,3,8,10,12)]

#rename columns ontology=Category, category = ID, term = Term, genes = Genes
dat2 <- dat1 %>% rename("Category" = "ontology",
                                        "ID" = "category",
                                        "Term" = "term", "Genes" = "genes")

# ...and of the data frame of selected genes
head(DEG.res)

#need to keep only rows for gene_id, log2fold, lfcse, stat, pvalue, padj, basemean
DEG.res1 <- DEG.res[c(7,2,3,4,5,6,1)]

#rename columns 
DEG.res2 <- DEG.res1 %>% rename("ID" = "gene_id",
                                        "logFC" = "log2FoldChange",
                                        "AveExpr" = "lfcSE", "t" = "stat", "P.Value" = "pvalue", "adj.P.Val" = "padj", "B" = "baseMean")

# Generate the plotting object, set up with circle_dat(terms, genes) terms dataframe has columns for category, ID, padj, etc. genes dataframe has columns for ID and logfc
circ <- circle_dat(dat2, DEG.res2)

#remove any genes without logFC from datfram
circ <-circ[!is.na(circ$logFC),]

make exporatory plots

# Generate a simple barplot
GOBar(subset(circ, category == 'BP'))

# Facet the barplot according to the categories of the terms 
GOBar(circ, display = 'multiple')

# Facet the barplot, add a title and change the colour scale for the z-score
GOBar(circ, display = 'multiple', title = 'Z-score coloured barplot', zsc.col = c('yellow', 'black', 'cyan'))

# Generate the bubble plot with a label threshold of 3
#got error: arguments imply differing number of rows: 1,0, found answer online to set labels= something smaller than defualt 5

GOBubble(circ, labels = 2)
daniellembecker commented 1 year ago

DEG_GO_parent_terms_complete.csv

DEG.res.csv

daniellembecker commented 1 year ago

@paolabc hello just checking in if you were able to see any issues with my data?? thank you!