lawremi / rtracklayer

R interface to genome annotation files and the UCSC genome browser
Other
29 stars 17 forks source link

New Bioc devel changes breaking many packages -- is this intentional? #36

Closed lshep closed 3 years ago

lshep commented 3 years ago

@lawremi cc @hpages

Possibly related to: https://github.com/lawremi/rtracklayer/pull/31

We are seeing many new errors on the build report:

could not find function "RestUri" (affecting at least: DeepBlueR, DMRcate, ELMER, ensembldb, Gviz, MEAL, methyAnalysis, missMethyl) and many along the forms of error in evaluating the argument 'object' in selecting a method for function 'getTable , error in evaluating the argument 'object' in selecting a method for function 'tableNames', along with others of similar syntax (at least affect: (CAGEr, corral, scp, ChIPpeakAnno, coMET, customProDB, IdeoViz, NoRCE, and probably more)

More importantly we think they are affecting GenomicRanges and GenomicFeatures, seeing an ERROR 'table' is a mandatory parameter and must be a single character vector

Can you please investigate the ERRORs?

lshep commented 3 years ago

also ... and feel free to email me the information -- what is your currently active email? -- multiple package descriptions have a mixture of 3 different emails and I'm not sure if you are receiving notifications.

lawremi commented 3 years ago

Thanks for the heads up. I moved restfulr from Depends to Imports (missed that in the PR). That should fix some errors. I will look into the rest over the long weekend.

I wonder if there would be an easy way for us to move the changes to a branch and then automatically run Bioc tests against that branch, as this is turning out to be more disruptive than expected. That would be beyond my level of git-fu.

sanchit-saini commented 3 years ago

@lawremi All left errors are the result of improper call toucscTableQuery method according to updated changes. During the migration process, the significance of the track parameter was dropped and the table parameter was made mandatory for calling the ucscTableQuery method.

I think a lot of the packages are calling the ucscTableQuery method with positional arguments and without the table parameter. Maybe these errors can be fixed by calling the ucscTableQuery method with named arguments, Or by swapping the position of track and table parameters in the method definition.

For example, ChIPpeakAnno in its vignettes calling loadIdeogram which internally calling to ucscTableQuery:

Quitting from lines 136-141 (pipeline.Rmd) Error: processing vignette 'pipeline.Rmd' failed with diagnostics: error in evaluating the argument 'object' in selecting a method for function 'getTable': 'table' is a mandatory parameter and must be a single character vector --- failed re-building 'pipeline.Rmd' Source : https://bioconductor.org/checkResults/3.13/bioc-LATEST/ChIPpeakAnno/riesling1-buildsrc.html

https://github.com/jianhong/ChIPpeakAnno/blob/408e33960c1211e31015221a4a1e1bd602b9a078/vignettes/pipeline.Rmd#L137 https://github.com/jianhong/trackViewer/blob/f13a42a69bc317386a03190599ea0b97580c006e/R/loadIdeogram.R#L23

Fix for this particular case would be

ideo <- getTable(ucscTableQuery(session, 
                           table = "cytoBandIdeo", 
                           range = range))
lawremi commented 3 years ago

Hi @sanchit-saini , thanks for the fast response. So if memory serves, it was impossible to get a default table for a track with the new REST API, because the new REST API does not have a "track" concept. Perhaps we could keep a little bit of the table query interface around that gets the default table for a track. It would issue a deprecation warning, giving all of these packages a chance to update to the simpler table-based world. How hard do you think that would be?

sanchit-saini commented 3 years ago

Hi @lawremi , I think we can

  1. Make a request to table browser with the genome and group to retrieve all tracks for a genome. e.g: Let's say the user specifies Assembly track in the ucscTableQuery, so to retrieve equivalent mapping of track value(gold in this case ) for next request, it issues a request : http://genome-euro.ucsc.edu/cgi-bin/hgTables?db=hg18&hgta_group=allTracks
  2. Another request to table browser with the genome, retrieved track code, and group to retrieve all table values for a given track. http://genome-euro.ucsc.edu/cgi-bin/hgTables?db=hg18&hgta_track=gold&hgta_group=allTracks By default, it's going to pick the first entry from the retrieved tables

After this, the ucscTableQuery should be able to handle the track parameter as it used to be by implicitly providing the table value.What do you think about this approach?

lawremi commented 3 years ago

Sounds great. What is the reason for retrieving all of the tracks? Is that because the user is requesting tracks by name instead of by the internal identifier? Otherwise, I guess it would be sufficient to look for the default table and throw an error if it fails (presumably because the track is not found).

sanchit-saini commented 3 years ago

Yes, for retrieving internal identifier from the track name. Okay, I will create a PR for this.

lshep commented 3 years ago

@lawremi again -- could you please let me know what your currently active email is? We encounter three or four different emails as maintainer emails and we want to make sure we can get a hold of you if necessary. Feel free to email me this information at lori.shepherd at roswellpark.org if you don't feel comfortable posting it here.

lawremi commented 3 years ago

I just pushed some commits to consolidate my emails (they should all work). I only found two variants (ignoring the deprecated exploRase). Thanks for bringing that to my attention.

sanchit-saini commented 3 years ago

Hi @lawremi, I tested all the affected packages and update the patch accordingly. It seems to work as expected for most of the packages. But GenomicFeatures and coMET require special attention.

For GenomicFeatures package:

For coMET package:

lawremi commented 3 years ago

Thanks. I think the GenomicFeatures usage of the internal ucscTableOutputs() is obsolete, since they can now simply ask for the tableNames() and know all of the valid tables, and I agree they should use the formal schema instead of an example output for determining the output types. @hpages might comment.

I'm guessing that gm12878Insitu track stored in the interact format. Perhaps if you requested the data in that format it would work? We'd need to add support for interact to rtracklayer, which seems like a good idea anyway, and it could behave like the existing BEDPE parsing (returning a Pairs object). Since interact is based on BED, it's probably as simple as a wrapper that provides the appropriate extraCols argument and constructs the Pairs object.

lshep commented 3 years ago

@lawremi Is the PR good to go to resolve the other packages that are currently failing? There were several more appearing with the 'table' is a mandatory parameter and must be a single character vector

lawremi commented 3 years ago

Waiting on one minor tweak that I had expected to be done today. Very close.

lshep commented 3 years ago

There also seems to be a large number of packages failing with a message similar to

There doesn't seem to be any cytoband data available for genome

While I've traced this to Gviz it seems potentially underlying with the call to ucscTableQuery as well. Do you expect these clear up with the pull request as well or might you suggest something to the Gviz maintainer if more needs to be corrected on their part.

sanchit-saini commented 3 years ago

Hi @lawremi @lshep yes, it should be completed by today.

There also seems to be a large number of packages failing with a message similar to

There doesn't seem to be any cytoband data available for genome

While I've traced this to Gviz it seems potentially underlying with the call to ucscTableQuery as well. Do you expect these clear up with the pull request as well or might you suggest something to the Gviz maintainer if more needs to be corrected on their part.

@lshep Can you provide the name of some packages that are affected by this?

lshep commented 3 years ago

Lets hold off to see if many clear up. The main offender was Gviz but I think he might have already implemented a change as it looks like there were some commits this afternoon.

sanchit-saini commented 3 years ago

Hi @lawremi, I pushed the updated changes, Can you review them? if anything requires further changes I'm happy to help.

lawremi commented 3 years ago

Thanks, Sanchit. I submitted a review of the PR. Just simple tweaks.

sanchit-saini commented 3 years ago

Thanks, Sanchit. I submitted a review of the PR. Just simple tweaks.

Done, I think now we can merge these changes.

lawremi commented 3 years ago

Indeed, thanks a lot for the fast turnaround @sanchit-saini. Just merged and pushed over to the Bioc git.

hpages commented 3 years ago

Hey @lawremi , @sanchit-saini , do you think one of you could fix GenomicFeatures, or submit a PR for it here: https://github.com/Bioconductor/GenomicFeatures? Thanks

sanchit-saini commented 3 years ago

Hey @lawremi , @sanchit-saini , do you think one of you could fix GenomicFeatures, or submit a PR for it here: https://github.com/Bioconductor/GenomicFeatures? Thanks

Sure, I will create a PR.

sanchit-saini commented 3 years ago

I think this issue can be closed now as all the affected packages are not failing because of rtracklayer.

hpages commented 3 years ago

I don't have the ability to close this. @lshep Lori? Thx