lawremi / rtracklayer

R interface to genome annotation files and the UCSC genome browser
Other
28 stars 17 forks source link

Example from section 5.3 in your vignette not working #41

Closed Krithika-Bhuvan closed 3 years ago

Krithika-Bhuvan commented 3 years ago

We are trying to obtain cytoband location and stain information (track names = cytoBand and cytoBandIdeo. Track numbers are 41, 41) from your package

We used the example from section 5.3 in your vignette, and tried this :.

mySession <- browserSession ()
genome(mySession) <- "hg19"
track.names <- trackNames(ucscTableQuery(mySession))
track.namesDF <- as.data.frame(track.names)
View(track.namesDF)
which(track.names=="cytoBandIdeo")

Chromosome Band (Ideogram) 42

tracks <- track.names [c (41,42)]

# Trial 1 -  runs for a long time
s <- sapply(tracks, function(track) {
length(tableNames(ucscTableQuery(mySession, track=track)))
})

# Trial 2 - not working
x <- length(tableNames(ucscTableQuery(mySession, tracks)))

_Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'object' in selecting a method for function 'tableNames': 'track' must be a single string
In addition: Warning message:
In .local(x, ...) :
  'track' parameter is deprecated now you go by the 'table' instead
                Use ucscTables(genome, track) to retrieve the list of tables for a track_
sanchit-saini commented 3 years ago

Hi @Krithika-Bhuvan

For Trial 1:

I ran your provided code, and it worked smoothly for me. Also, I run a benchmark on it, overall it did not take longer than a minute to finish it.

> tracks <- track.names [c (41,42)]
> system.time(s <- sapply(tracks, function(track) {
length(tableNames(ucscTableQuery(mySession, track=track)))
}))
   user  system elapsed
  3.025   0.422  39.172

Warning messages:
1: In .local(x, ...) :
  'track' parameter is deprecated now you go by the 'table' instead
                Use ucscTables(genome, track) to retrieve the list of tables for a track
2: In .local(x, ...) :
  'track' parameter is deprecated now you go by the 'table' instead
                Use ucscTables(genome, track) to retrieve the list of tables for a track
>

In this case, it took 40ish seconds to complete it

For Trial 2:

You provided track with a character vector of length two. In ucscTableQuery, track must be a single string. In this case, ucscTableQuery worked as it was supposed to function(stop the execution gracefully while throwing a message track must be a single string).

Krithika-Bhuvan commented 3 years ago

Hello, thanks for checking and for the quick response . I did some more testing . The code works fine for older version of R/BioC, but not the upcoming new version of R /BioC

Question - Can you please provide a workaround so the code will work for the new version ?

Test # 1: R version 3.6 : works fine

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("rtracklayer")
library("rtracklayer")
mySession <- browserSession()
genome(mySession) <- "hg19"
track.names <- trackNames(ucscTableQuery(mySession))
track.namesDF <- as.data.frame(track.names)
View(track.namesDF)
which(track.names=="cytoBandIdeo")

tracks <- track.names [c (41,42)]

#loaded_tracks <- trackNames(mySession)
subTargetTrack <- track(mySession, "cytoBandIdeo")
subTargetTrack2 <- track(mySession, "cytoBand")

subTargetTrack1DF <- as.data.frame(subTargetTrack)
subTargetTrack2DF <- as.data.frame(subTargetTrack2)

Test # 2 : R version 4.1 (this is testing for the upcoming BioC and R release)

 subTargetTrack <- track(mySession, "cytoBandIdeo")

Error: track is meaningless now you only go by the table

subTargetTrack2 <- track(mySession, "cytoBand") Error: track is meaningless now you only go by the table

> version
               _                                                 
platform       x86_64-apple-darwin17.0                           
arch           x86_64                                            
os             darwin17.0                                        
system         x86_64, darwin17.0                                
status         Under development (unstable)                      
major          4                                                 
minor          1.0                                               
year           2021                                              
month          02                                                
day            07                                                
svn rev        79964                                             
language       R                                                 
version.string R Under development (unstable) (2021-02-07 r79964)
nickname       Unsuffered Consequences   
sanchit-saini commented 3 years ago

It is not working as it is the cycle of being deprecated. For now, you can use this:

subTargetTrack <- track(ucscTableQuery(mySession, "cytoBandIdeo"))

But soon queries will only work with the table name instead of the track name. You can use ucscTables("hg19", "cytoBandIdeo") to get a list of tables for a track and use relevant table name(instead of track) for all future queries. (This will only in the upcoming branch).

Krithika-Bhuvan commented 3 years ago

Thank you. Are these changes committed to latest Bioconductor build ? This code change in your package is affecting our package as well which is not building at this time. Is there a way for us to test your new code in R 4.1 ?

I tried the following code in R 4.1 ucscTables("hg19", "cytoBandIdeo"). But it only says "cytoBandIdeo".

sanchit-saini commented 3 years ago

Are these changes committed to latest Bioconductor build ?

Yes

Is there a way for us to test your new code in R 4.1 ?

BiocManager::install(version = "devel") # switch to devel 
BiocManager::install("rtracklayer") # install rtracklayer

After these steps, you can play with the latest rtracklayer features.

I tried the following code in R 4.1 ucscTables("hg19", "cytoBandIdeo"). But it only says "cytoBandIdeo".

It means there is only one table for the cytoBandIdeo track that is named cytoBandIdeo. There are cases in which track name and table name are identical, Yeah it is a bit confusing.

Krithika-Bhuvan commented 3 years ago

Thank you for the explanation. Its starting to make sense, but I still don't understand how to extract the contents of the track/table. The result of this ucscTables("hg19", "cytoBandIdeo") is a character string with the name of the track.

We are looking to extract the information inside the cytoBandIdeo track/table. It contains the following information: Chromosome number, Chr Start Position, Chr End Position, CytobandName, Stain. Please advise on how to extract this. Thank you very much. Please see below an example of the information the track contains - this was obtained through the browser.

Screen Shot 2021-02-24 at 10 23 26 AM
sanchit-saini commented 3 years ago
library(rtracklayer)
query <- ucscTableQuery("hg19", table = "cytoBandIdeo") # create a query against a UCSC Table browser
table <- getTable(query) # retrieve table
table

for more details, refer to: https://rdrr.io/bioc/rtracklayer/man/UCSCTableQuery-class.html or https://github.com/lawremi/rtracklayer/blob/master/man/UCSCTableQuery-class.Rd

Krithika-Bhuvan commented 3 years ago

Excellent. Thank you so much ! Closing this issue now