lawremi / rtracklayer

R interface to genome annotation files and the UCSC genome browser
Other
29 stars 17 forks source link

error with TackNames and UCSC #4

Closed eandresleon closed 6 years ago

eandresleon commented 6 years ago

More info at:

https://support.bioconductor.org/p/103829/

lawremi commented 6 years ago

Already fixed.

eandresleon commented 6 years ago

Thanks. I've tested in 1.38.2 and it worked. Thanks again

marencc commented 6 years ago

I've teste 1.38.2 and still get the same error.

> refSeq             <- makeTxDbFromUCSC(genom="hg19",tablename="refGene")
Error in names(trackIds) <- sub("^ ", "", sapply(nodes, xmlValue)) : 
  'names' attribute [210] must be the same length as the vector [208]

> sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.2 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=es_ES.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.28.0 rtracklayer_1.38.2 magrittr_1.5 forcats_0.2.0 stringr_1.2.0 dplyr_0.7.4 [7] purrr_0.2.4 readr_1.1.1 tibble_1.3.4 ggplot2_2.2.1 tidyverse_1.2.1 broom_0.4.3 [13] reshape2_1.4.3 annotate_1.56.1 XML_3.98-1.9 org.Hs.eg.db_3.5.0 GenomicFeatures_1.30.0 AnnotationDbi_1.40.0 [19] Biobase_2.38.0 tidyr_0.7.2 GenomicRanges_1.30.0 GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0 [25] BiocGenerics_0.24.0 refGenome_1.7.3 RSQLite_2.0 doBy_4.5-15 limma_3.34.3 dtplyr_0.0.2 [31] genefilter_1.60.0 RColorBrewer_1.1-2 RSkittleBrewer_1.1 plyr_1.8.4 loaded via a namespace (and not attached): [1] nlme_3.1-131 bitops_1.0-6 matrixStats_0.52.2 lubridate_1.7.1 bit64_0.9-7 [6] httr_1.3.1 tools_3.4.0 R6_2.2.2 DBI_0.7 lazyeval_0.2.1 [11] colorspace_1.3-2 mnormt_1.5-5 bit_1.1-12 compiler_3.4.0 cli_1.0.0 [16] rvest_0.3.2 xml2_1.1.1 DelayedArray_0.4.1 scales_0.5.0 psych_1.7.8 [21] digest_0.6.12 Rsamtools_1.30.0 foreign_0.8-69 XVector_0.18.0 pkgconfig_2.0.1 [26] rlang_0.1.4 readxl_1.0.0 rstudioapi_0.7 bindr_0.1 jsonlite_1.5 [31] BiocParallel_1.12.0 RCurl_1.95-4.8 GenomeInfoDbData_0.99.1 Matrix_1.2-12 Rcpp_0.12.14 [36] munsell_0.4.3 stringi_1.1.6 MASS_7.3-47 SummarizedExperiment_1.8.0 zlibbioc_1.24.0 [41] grid_3.4.0 blob_1.1.0 crayon_1.3.4 lattice_0.20-35 Biostrings_2.46.0 [46] haven_1.1.0 splines_3.4.0 hms_0.4.0 biomaRt_2.34.0 glue_1.2.0 [51] modelr_0.1.1 cellranger_1.1.0 gtable_0.2.0 assertthat_0.2.0 xtable_1.8-2 [56] survival_2.41-3 GenomicAlignments_1.14.1 memoise_1.1.0 bindrcpp_0.2 --

lawremi commented 6 years ago

I'm unable to reproduce this. It may be a regional thing, where different UCSC mirrors or localizations return slightly different HTML.

You could do:

debug(rtracklayer:::ucscTracks, sig="UCSCSession")

And then evaluate the first line, to assign the tracks object. Then print tracks and put the output in a pastebin or something.

marencc commented 6 years ago

Hi! if I do that, this is the output I get:

function (object, form = list()) { tracks <- ucscGet(object, "tracks", form) nodes <- getNodeSet(tracks, "//select/option[@selected]/text()") trackModes <- sapply(nodes, xmlValue) nodes <- getNodeSet(tracks, "//select/@name") trackIds <- unlist(nodes) nodes <- getNodeSet(tracks, "//select/../a/text()") nms <- sapply(nodes, xmlValue) names(trackIds) <- sub("^ ", "", nms[nms != "new"]) new("ucscTracks", ids = trackIds, modes = trackModes)}

lawremi commented 6 years ago

Did you read the part of the instructions after the debug() call? You'll need to evaluate the first line and print the tracks object.

marencc commented 6 years ago

I don't get any output after the debug() call. I'm sorry, I don't understand what you mean by printing the tracks object. How can I do this to help with the improvement?

marencc commented 6 years ago

`

debug(rtracklayer:::ucscTracks, sig="UCSCSession") refSeq <- makeTxDbFromUCSC(genome="hg19",tablename="knownGene") Tracing function ".local" in package "rtracklayer" Tracing .local(object, ...) step 2 Called from: eval(expr, p) Browse[1]> n debug: tracks <- ucscGet(object, "tracks", form) Browse[2]> Q`

lawremi commented 6 years ago

Instead of typing Q just type tracks.

marencc commented 6 years ago

Here: `

debug(rtracklayer:::ucscTracks, sig="UCSCSession") refSeq <- makeTxDbFromUCSC(genome="hg19",tablename="knownGene") Tracing function ".local" in package "rtracklayer" Tracing .local(object, ...) step 2
Called from: eval(expr, p) Browse[1]> n debug: tracks <- ucscGet(object, "tracks", form) Browse[2]> tracks Error: object 'tracks' not found `

lawremi commented 6 years ago

Sorry, you will need to type n twice, not just once, before that.

marencc commented 6 years ago

It's strange, I can't. It's automatically done.

`

debug(rtracklayer:::ucscTracks, sig="UCSCSession") refSeq <- makeTxDbFromUCSC(genome="hg19",tablename="knownGene") Tracing function ".local" in package "rtracklayer" Tracing .local(object, ...) step 2
Called from: eval(expr, p) Browse[1]> n debug: tracks <- ucscGet(object, "tracks", form)

Browse[2]> | Browse[2]> Browse[2]>

`

marencc commented 6 years ago

I f I try to eval (tracks) it can't find the object. if I call the "makeTxDbFromUCSC". I get the output that I sent.

marencc commented 6 years ago

I've restarted R and... and loaded new libraries (TxDb.Hsapiens.UCSC.hg19.knownGene) and now is working. Many thanks for your help.

nbahti commented 6 years ago

@marencc can you please list the steps you did to get makeTxDbFromUCSC work?

swvanderlaan commented 6 years ago

Yes! Please @marencc do list the steps you took to get makeTxDbFromUCSC to work.

I did:

source("https://bioconductor.org/biocLite.R")
biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
biocLite("GenomicFeatures")
biocLite("rtracklayer")

# What is available?
supportedUCSCtables(genome = "hg19", url = "http://genome.ucsc.edu/cgi-bin/")

But I still get the same error as you @marencc. Admittedly, I also get a message stating R Studio can't install "RMySQL"... and thus I end up in dependency hell. Would love to get to know a method that doesn't end up into that hell... :-)

Thanks!

marencc commented 6 years ago

@swvanderlaan, @nbahti I'm sorry, I made it work for one week or so. Currently, I get this output: > refSeq2 <- makeTxDbFromUCSC(genome="hg19",tablename="knownGene") Download the knownGene table ... Error: Bad Request So, it's not working again. Previously, I just updated the TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 I can't make it work now...

lawremi commented 6 years ago

Possibly just a temporary networking issue? I can't reproduce this. It's also possible that it is mirror-dependent. UCSC sometimes behaves slightly differently depending on the mirror, and thus your region of the world.

@swvanderlaan are you using R release or devel? Failure to install RMySQL probably means that there is no CRAN binary available for your version of R.

maximilianh commented 6 years ago

Our mirrors are identical Mysql servers. As far as I know, we haven't changed the refGene table, I think we have never changed since they were released, to avoid problems like this. Let me know if you run into similar problems again.

lawremi commented 6 years ago

Is anyone still experiencing issues like these?

nbahti commented 6 years ago

Yes, this problem showed up again since last week for me. Two or three months ago, I fixed it, but can't remember now how I did it.

On 7 May 2018 at 12:24, Michael Lawrence notifications@github.com wrote:

Is anyone still experiencing issues like these?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lawremi/rtracklayer/issues/4#issuecomment-387139194, or mute the thread https://github.com/notifications/unsubscribe-auth/AV4oTzvoERn19VzNuPoYTVCMZZSc6EpCks5twINngaJpZM4Q9QPO .

maximilianh commented 6 years ago

If you're parsing our html, maybe whoever wrote the code for that can contact us one day or start a new ticket for a discussion. To import data, there shouldn't be any need for parsing html pages.

lawremi commented 6 years ago

If it showed up last week then it is probably the same as #8, which is fixed. So I think I'll close this issue.

Yes, parsing the HTML is pretty dumb when getting data, and I take all the blame for it. Again, the package began life as a way to programmatically manipulate the browser view. Now it's just used for the once side-feature of getting data.

maximilianh commented 6 years ago

Well, if you ever want this to use the mysql server or some other way (e.g. we could write a little API on our side), let me know. I'm pretty sure we can find you a nice SQL query to make it more stable and avoid you having to patch this the time.

We certainly see many http requests come in with the rtracklayer user agent. We slow them down if there are too many from the same IP address, but until now it hasn't caused too much trouble.

lalchungnunga commented 6 years ago

Hi, The following error occur when trying to download hg19 known gene. I am using rtracklayer 1.38.3

hg19db <- makeTxDbFromUCSC(genome = "hg19",tablename ="knownGene") Error in names(trackIds) <- sub("^ ", "", nms[nms != "new"]) : 'names' attribute [212] must be the same length as the vector [211]

lawremi commented 6 years ago

Please update to latest Bioc release.