Closed craigcitro closed 10 years ago
@craigcitro thanks so much for this and your continued help with bigrquery
and httr
!
It has been merged via 78df32fc71779a126a220ce30f6eba125fa7feb1 into a fork where I making a bunch of other unrelated updates and I'll regenerate the md files all in one go.
Thanks @craigcitro and Nicole for all the hard work! So is this ready to be tested now?
Thanks, Paul
Hi @pgrosu,
If you want to give the new version of bigrquery
a spin
install.packages("httr")
devtools::install_github("bigrquery")
Thanks a bunch! Nicole
Hi Nicole,
I would glad to try it out. I'll let you know how it goes.
Many thanks for all the great work! Paul
Hi Nicole,
So I tested these using @craigcitro's changes found in this pull, that are not yet merged.
The one change I made is to replace sql
with sql.query
in the DisplayAndDispatchQuery()
function, since sql
is a reserved word (function), which is part of the dplyr
package.
I was unable to test everything, since I got a couple of Exceeded quota:
error messages. I would be happy to test more, but how would you recommend I go about having this quota limit lifted?
I will show the two examples that gave me that error, and then I'll show a complete end-to-end example that worked - which is consolidated from several Readme.Rmd
files.
Below is the error message:
Error: Exceeded quota: too many query byte volume for this project
quotaExceeded. Exceeded quota: too many query byte volume for this project
Below are the R commands that returned this error message:
result <- DisplayAndDispatchQuery("./1000genomes/sql/sample-level-data-for-brca1.sql")
result <- DisplayAndDispatchQuery("./1000genomes/sql/shared-variant-counts-by-ethnicity.sql")
Below is a complete example that worked:
install.packages("httr")
install.packages("devtools")
devtools::install_github("assertthat")
devtools::install_github("bigrquery")
install.packages("dplyr")
install.packages("ggplot2)
install.packages(xtable)
install.packages(testthat)
require(bigrquery)
require(ggplot2)
require(dplyr)
require(xtable)
require(testthat)
project <- "......" # My project ID which I anonymized :)
DisplayAndDispatchQuery <- function(queryUri) {
sql.query <- readChar(queryUri, nchars=1e6)
cat(sql.query)
query_exec(sql.query, project)
}
result <- DisplayAndDispatchQuery("./1000genomes/sql/ratio-of-dbsnp-variants-by-chromosome.sql")
print(xtable(result, digits=6), type="html", include.rownames=F)
qplot(num_variants, num_dbsnp_variants, color=num_variants, data=result) +
scale_colour_gradient("Number of Variants", labels=function(x)round(x)) +
ylab("Number of dbSNP Variants") +
xlab("Number of Variants") +
ggtitle("dbSNP Variant Count vs. Total Variant Count by Chromosome") +
geom_text(aes(label=contig), hjust=-1, vjust=0)
Thanks, Paul
Hi Paul,
That's helpful feedback - thank you! I believe that error message occurs when billing has not been enabled for the Google Cloud Platform project and queries have exceeded the amount of data that is free of charge per month. (Its easy to exceed the threshold with the 1,000 Genome data since its quite large.) Does that sound correct to you?
Take-aways:
sql
throughout.Thanks again, Nicole
Hi Nicole,
Glad to help out and yes, you are correct :) I didn't want to enable billing unless it was specific to a particular analysis project. Something that might be helpful, is to have temporary project IDs that don't have this limit, in order to properly perform all the tests.
Feel free to let me know anything else you'd like me to do regarding this.
Thanks, Paul
Also, thanks for spotting the sql
thing -- I'll update that in docs for bigrquery
, too.
Sure thing :)
PTAL @deflaux (if you'd rather just make this update yourself, I can drop this PR.)
I recently made a change to the signature of the
query_exec
function inbigrquery
R package, which meant that all calls in the examples needed updating. I've done that here, with a few notes/caveats:default_dataset
option, which isn't common in most BQ usage. In addition, it looks like all the examples have explicitproject:dataset.table
names, which is preferred anyway -- so I've just dropped the extra arguments. If there's something more subtle going on that I didn't catch, let me know.billing_project
toproject
everywhere -- this matches both the (updated)bigrquery
docs and the BigQuery docs more closely..md
files -- I did load up one and run it, but it looks like there were other changes unrelated to what I was up to that I was also picking up. I kept the output, and am happy to include it -- but I suspect it makes more sense to do them all at once.git blame
noise.