jokergoo / rGREAT

GREAT Analysis - Functional Enrichment on Genomic Regions
https://jokergoo.github.io/rGREAT
Other
81 stars 11 forks source link

Cannot use plotRegionGeneAssociationGraphs for specific term ID when using background #14

Closed mooratov closed 4 years ago

mooratov commented 5 years ago

Hello, I am trying to retrieve the table for my enriched term's genomic region-gene association

This works just fine if i don't specify a background (allow it to be wholeGenome) and use the methodology specified in the tutorial, e.g.

res = plotRegionGeneAssociationGraphs(job, ontology = "GO_Molecular_Function", termID = "GO:0004984")

however, if i run a job with the same foreground region, but use a custom background file, the command does not work, I get the following error and traceback

Error in check_asso_file(f_term): Empty data, probably your 'termID' is invalid.

Traceback:

  1. plotRegionGeneAssociationGraphs(job, ontology = "GO Biological Process", . termID = "GO:0003209")
  2. plotRegionGeneAssociationGraphs(job, ontology = "GO Biological Process", . termID = "GO:0003209")
  3. .local(job, ...)
  4. check_asso_file(f_term)
  5. stop("Empty data, probably your 'termID' is invalid.\n")

note that the enrichment tables look just fine with this configureation, and the command works ok if i don't specify a specific GO term ID.

Thanks

jokergoo commented 5 years ago

Hi, can you send me both query regions and background regions to z.gu at dkfz.de? I tried with random regions and it works fine.

ceesu commented 4 years ago

Hi, I'm also seeing the error above come up during select terms, even when I do not include a query for background. e.g. for my data frame pos (a bed file):

head(pos) V1 V2 V3 V4 1 chr1 77466613 77469153 peak5472 2 chr1 134914885 134918095 peak9293 3 chr1 195376099 195377973 peak15766 4 chr10 56095697 56102134 peak19965

My code is something like this:

job = submitGreatJob(pos,
                     species = "mm9",
                     request_interval = 5,
                     adv_span =500)

tb = getEnrichmentTables(job, ontology = "GO Biological Process")

tb <- as.data.frame(tb$`GO Biological Process`)

ordered <- as.data.frame(tb[order(tb$Hyper_Adjp_BH),])

for (id in ordered$ID){
  res = plotRegionGeneAssociationGraphs(job, ontology = "GO Biological Process",
                                        termID = id)}

It runs for some terms, then gives this error during the loop:

Error in check_asso_file(f_term) : Empty data, probably your 'termID' is invalid.

The traceback seems fairly similar:

> traceback()
5: stop("Empty data, probably your 'termID' is invalid.\n")
4: check_asso_file(f_term)
3: .local(job, ...)
2: plotRegionGeneAssociationGraphs(job, ontology = "GO Biological Process", 
       termID = id)
1: plotRegionGeneAssociationGraphs(job, ontology = "GO Biological Process", 
       termID = id)

I get the 'Error in check_asso_file' error for example when id="GO:0000715" even though this id is among those returned by the getEnrichmentTables() command. If I add a background then it seems I get the 'Error in check_asso_file' every single time I do the plotRegionGeneAssociationGraphs query.

Do you know why this might be happening? Should I perhaps send the regions file?

jokergoo commented 4 years ago

Hi sorry for the late reply. I was on vocation.

I just noticed GREAT website has released a new version (http://great.stanford.edu/public/html/, version 4) and there are some changes for the supported functional catalogs. I am not sure whether this causes your error. I have adjusted the package for the new GREAT version. You can redo you analysis and please report to me if you still have this problem.

kowaae22 commented 4 years ago

Sorry to tag on a little late, but I'm experiencing this same issue. Here's the code I'm using:

library(rGREAT)

bgdf=read.table("bfbgd.bed", header=T) sigposdf=read.table("bfsigpos.bed", header=T)

posjob=submitGreatJob(sigposdf, bg=bgdf, species = "hg19")

tbpos = getEnrichmentTables(posjob)

par(mfrow = c(1, 3)) res = plotRegionGeneAssociationGraphs(posjob, ontology = "GO Biological Process", termID = "GO:0007156")

And the error is: Error in check_asso_file(f_term) : Empty data, probably your 'termID' is invalid.

Just like other people who have posted, the enrichment tables look fine and the term ID I used is present in the GO Biological Process enrichment table.

jokergoo commented 4 years ago

It works fine with a randomly generated bed file. Can you send me the bed files you use (to my email address)?

ceesu commented 4 years ago

Thanks for your reply, I updated the package so I'm on rGREAT_1.17.1 but it still seems to happen some of the time. I will send files to you.

jokergoo commented 4 years ago

Hi @mooratov @ceesu @Amylith , I guess @ceesu has sent me some example datasets (I cann't correspond the name in the email and the username here :)).

According to the test dataset, there is an error when executing plotRegionGeneAssociationGraphs() for the GO id GO:0000076. If you construct the url for this GO ID, just like other successful GO ids:

http://great.stanford.edu/public-4.0.4/cgi-bin/showTermDetails.php?termId=GO:0000076&ontoName=GOBiologicalProcess&ontoUiName=GO%20Biological%20Process&sessionName=20190906-public-4.0.4-N0wDve&species=mm9&foreName=file4388604d634b.gz&backName=file438878b20251.gz&table=region

@ceesu people can not access you original peak files with above link, so I put this link public.

You will find there is no result for this GO id on the GREAT website neither, which means, this is a problem of GREAT web service while not rGREAT package.

If you run plotRegionGeneAssociationGraphs() in a loop, I suggest you to use try() to capture this error while not break the loop.

ceesu commented 4 years ago

I see, thanks so much for looking into this! I will try your suggestion.