Closed tamsinlh closed 11 months ago
Hi @tamsinlh , Looks like the ontology terms are not where rrvgo expects to find them in the orgdb object. Would you mind to share the orgdb object, and perhaps the R script you wrote to generate it? Something that I could use to reproduce your issue in my end.
cheers
@ssayols Thank you. Below is the code you've requested. It doesn't appear to allow attaching the orgdb object. I would be happy to send it to you if you would be able to direct me.
Many thanks.
gsnapperFile <- system.file("extdata","GSnap.CorrectEggnog.txt",
package="AnnotationForge")
gsnapper <- read.table(gsnapperFile,sep="\t", head=TRUE)
head(gsnapper)
colnames(gsnapper)
#[1] "query" "seed_ortholog" "evalue" "score"
#[5] "eggNOG_OGs" "max_annot_lvl" "COG_category" "Description"
#[9] "Preferred_name" "GOs" "EC" "KEGG_ko"
#[13] "KEGG_Pathway" "KEGG_Module" "KEGG_Reaction" "KEGG_rclass"
#[17] "BRITE" "KEGG_TC" "CAZy" "BiGG_Reaction"
#[21] "PFAMs"
GSym <- gsnapper[,c(1,9,8)]
GSym <- GSym[GSym[,2]!="-",]
GSym <- GSym[GSym[,3]!="-",]
colnames(GSym) <- c("GID","SYMBOL","GENENAME")
head(GSym)
#GID SYMBOL
#4 g6.t1 TNC
#8 g12.t1 RTL1
#17 g36.t1 HERC1
#27 g63.t1 MYPOP
#GENENAME
#4 Tenascin C
#8 multicellular organism development
#17 HECT and RLD domain containing E3 ubiquitin protein ligase family member 1
#27 proximal promoter DNA-binding transcription repressor activity, RNA polymerase II-specific
gsnapperGeneFile <- system.file("extdata","annoData.genes.txt",
package="AnnotationForge")
gsnapperGenes <- read.table(gsnapperGeneFile,sep="\t", head=TRUE)
head(gsnapperGenes)
gsChr <- gsnapperGenes[,c(1,2)]
gsChr <- gsChr[gsChr[,2]!="-",]
colnames(gsChr) <- c("GID","CHROMOSOME")
head(gsChr)
#GID CHROMOSOME
#1 g1.t1 1
#2 g10.t1 1
#3 g100.t1 1
#4 g1000.t1 1
gsnapperGOFile <- system.file("extdata","GSnap.GOsAsList4.tsv",
package="AnnotationForge")
gsGO <- read.table(gsnapperGOFile,sep="\t", header=TRUE)
gsGO <- gsGO[gsGO[,2]!="-",]
gsGO <- gsGO[gsGO[,3]!="",]
colnames(gsGO) <- c("GID","GO", "EVIDENCE")
head(gsGO)
#GID GO EVIDENCE
#4 g6.t1 GO:0000003 IEA
#5 g6.t1 GO:0001101 IEA
#6 g6.t1 GO:0001655 IEA
#7 g6.t1 GO:0001763 IEA
makeOrgPackage(gene_info=GSym, chromosome=gsChr, go=gsGO,
version="0.2",
maintainer="Some One <so@someplace.org>",
author="Some One <so@someplace.org>",
outputDir = ".",
tax_id="40503",
genus="Lutjanus",
species="griseusR",
goTable="go")
#Warning messages:
#1: ggrepel: 16 unlabeled data points (too many overlaps). Consider increasing max.overlaps
#2: ggrepel: 10 unlabeled data points (too many overlaps). Consider increasing max.overlaps
#3: ggrepel: 10 unlabeled data points (too many overlaps). Consider increasing max.overlaps
Sys.setenv(R_INSTALL_STAGED = FALSE)
install.packages("./org.LgriseusR.eg.db", repos=NULL, type = "source")
library(org.LgriseusR.eg.db)
my_new_fancy_orgdb_object <- 'org.LgriseusR.eg.db'
my_new_fancy_orgdb_object
go_analysis <- read.delim(system.file("extdata/CHSA2019_SigDiffMeth2.txt", package="rrvgo"))
head(go_analysis)
# ID Avg_Meth_Diff geneID Description
#1 - -21.32544 FAM124A Family with sequence similarity 124A
#2 GO:0001766 -17.09158 PLLP Plasmolipin
#3 GO:0003674 -17.09158 PLLP Plasmolipin
#4 GO:0005198 -17.09158 PLLP Plasmolipin
#5 GO:0005575 -17.09158 PLLP Plasmolipin
#6 GO:0005623 -17.09158 PLLP Plasmolipin
simMatrix2 <- calculateSimMatrix(go_analysis,
orgdb="org.LgriseusR.eg.db",
semdata = GOSemSim::godata(my_new_fancy_orgdb_object, ont="BP",keytype = 'GID'),
method="Rel",
keytype = "GID")
#preparing gene to GO mapping data...
#preparing IC data...
#Warning message:
# In calculateSimMatrix(go_analysis, orgdb = "org.LgriseusR.eg.db", :
# No terms were found in orgdb for BP
# Check that the organism and ontology match the ones provided by orgdb
The input to calculateSimMatrix()
should be a vector of GO terms.
You're currently calling it with a data.frame that contains multiple columns. Instead, use the column with the GO ID only:
simMatrix2 <- calculateSimMatrix(go_analysis$ID,
orgdb="org.LgriseusR.eg.db",
semdata = GOSemSim::godata(my_new_fancy_orgdb_object, ont="BP",keytype = 'GID'),
method="Rel",
keytype = "GID")
My guess is that this should work. Please let me know otherwise.
@ssayols Yes, thank you, that was an easy fix. Much appreciated.
Additionally, would you have any guidance on getting the treemapPlot to label all, or the majority, of boxes and adjusting max.overlaps for the scatterplot?
Usually it's a matter of the canvas size, whether there's enough room for more labels.
Theoretically, the plotting functions maximize the number of labels displayed based on the space available.
You could try increasing the size of the canvas with pdf(file="x.pdf", height=10, width=10)
(instead of the default 7x7 inches).
Thank you. That doesn't appear to work for me - it makes the pdf I created unusable. When R plots the treemap, it is very tiny in the window and the .pdf reflects the same size and labeling. I am wondering if there are too many GO terms grouped together - I am needing to set the similarity to 0.9 to be able to see the labels I do have. The example for the package has far fewer boxes than I am seeing so I suspect that is part of the issue for me.
that's right, treemaps maybe difficult to navigate with too many terms. Alternatively you could try other visualizations, like sunburst plots (currently the package doesn't support them), or simply a scatterplot (volcano visualizing -log(p) vs. FC) of the parent terms only.
Hello.
I am studying an organism that requires me to make an orgdb object. I followed the tutorial to do so with AnnotationForge using both my organism and the one it provides as an example, but when running calculateSimMatrix I recieve the following error:
No terms were found in orgdb for BP Check that the organism and ontology match the ones provided by orgdb (I am running rrvgo version 1.12.22.)
Any help would be appreciated. I would very much like to use this program.
Thank you in advance.