Closed dorkylever closed 1 year ago
Sorry for the terrible formatting
It's the same issue as https://www.biostars.org/p/9552594/
This seems independent of the query:
gr <- GRanges(seqnames = c("chr1", "chr1", "chr2", "chr2"), ranges = IRanges(start = c(100, 200, 300, 400), end = c(150, 250, 350, 450)), strand = c("+", "-", "+", "-"))
test <- ucscTableQuery(session, track = "cons35way", range = gr, table = "multiz35way") ans <- rtracklayer::getTable(test)
str(ans) 'data.frame': 1000000 obs. of 7 variables: $ bin : num 0 0 1 1 1 1 1 1 1 2 ... $ chrom : chr "chr1" "chr1" "chr1" "chr1" ... $ chromStart: num 6.71e+07 1.34e+08 8.39e+06 1.68e+07 2.52e+07 ... $ chromEnd : num 6.71e+07 1.34e+08 8.39e+06 1.68e+07 2.52e+07 ... $ extFile : num 1 1 1 1 1 1 1 1 1 1 ... $ offset : num 3.57e+09 7.24e+09 1.72e+08 6.19e+08 1.12e+09 ... $ score : num 0.502 0.886 0.51 0.501 0.728 ...
I don't understand
Wait, can the Grange object only have a single value?
i.e., something like: GRanges("chr1", IRanges(start = 38561922, end = 38561922), strand = "-")
So I'm having to do an sapply for multiple queries like such:
sapply(seq_along(for_query), function(i) { Sys.sleep(3) getTable(ucscTableQuery(mySession, track="cons35way",range=for_query[i], table="multiz35way"))})
but this seems inconventient and you get errorHandler(responseError) : Too Many Requests
nevermind there's myRSQL that I should probably be using. I just got baited by the fact that the input was a Granges objects, thinking it should have been for more high-throughput-.
For anybosy else that gets stuck on getting GERP scores in the void - look at this post:
Hi All,
I have a set of mouse SNPs (~974) from GRCm39 that I'm trying to get conservation scores on. Ultimately I would like to get the GERP conservation score files available from the the Ensembl FTP site, import the BigWig track into rtracklayer and retrieve the scores for each SNP. Just for testing, however, I'm using the multiz35way conservation score as it's pre-built.
However, when I do this, the ranges are ignored and it returns just max possible entries for chromosome one.
library(rtracklayer)
session <- browserSession("UCSC")
genome(session) <- "mm39"
clone_for_snp_gr <- snps_filtered_from_FVB
clone_for_snp_gr$chr_name <- unlist(clone_for_snp_gr$chr_name)
clone_for_snp_gr$chrom_start <- unlist(clone_for_snp_gr$chrom_start)
clone_for_snp_gr$chrom_end <- unlist(clone_for_snp_gr$chrom_end)
snp_gr <-makeGRangesFromDataFrame(clone_for_snp_gr, seqnames.field = "chr_name", start.field = "chrom_start", end.field = "chrom_end")
snp_gr <- renameSeqlevels(snp_gr, paste0("chr", seqlevels(snp_gr)))
for_query <- GRangesForUCSCGenome("mm39", chrom = snp_gr@seqnames, ranges = snp_gr@ranges)
for_query
GRanges object with 974 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr9 119800452 *
[2] chr7 44199769 *
[3] chr7 132616737 *
[4] chr4 136275390 *
[5] chr4 136275455 *
... ... ... ...
[970] chr9 120868329 *
[971] chr9 120868349 *
[972] chr2 165916862 *
[973] chr9 71837860 *
[974] chrX 71962690 *
seqinfo: 61 sequences (1 circular) from mm39 genome
test <- ucscTableQuery(session, track="cons35way", range=for_query, table="multiz35way")
con_scores <- getTable(test)
con_scores
bin chrom chromStart chromEnd extFile offset score 1 0 chr1 67108799 67108920 1 3567081335 0.501910 2 0 chr1 134217621 134217740 1 7240893127 0.886047 3 1 chr1 8387705 8389370 1 172116144 0.510487 4 1 chr1 16777183 16777248 1 618514000 0.501355 5 1 chr1 25165823 25165834 1 1118818193 0.727571 6 1 chr1 33554431 33554447 1 1422195359 0.547642 7 1 chr1 41942996 41943218 1 1932068494 0.501428 8 1 chr1 50331618 50332996 1 2328581995 0.500599 9 1 chr1 58719908 58722745 1 2933344225 0.500000 10 2 chr1 75497420 75497476 1 4222662685 0.499302 11 2 chr1 83886033 83886164 1 4835125740 0.502277 12 2 chr1 92274310 92275035 1 5366193570 0.502864 13 2 chr1 100663007 100663507 1 5709210931 0.503867 14 2 chr1 109049041 109054290 1 6051152786 0.500000 15 2 chr1 117440159 117440543 1 6238167617 0.500000 16 2 chr1 125829054 125829155 1 6626874554 0.500635 [ reached 'max' / getOption("max.print") -- omitted 999858 rows ]`Kind Regards, Kyle