waldronlab / cBioPortalData

Integrate the cancer genomics portal, cBioPortal, using MultiAssayExperiment
https://waldronlab.io/cBioPortalData/
30 stars 12 forks source link

Add Docker Image + scripts #21

Closed inodb closed 4 years ago

inodb commented 4 years ago

Use anaconda base image + the existing cbioportaldata conda dependency to build the image. Then re-install cBioPortalData using the code in the repo. This is much faster than compiling all dependencies from source (compare <5m of GitHub action vs >20m for Travis)

lwaldron commented 4 years ago

Awesome, @inodb! A couple comments:

  1. Since AFAIK bioconda doesn't directly track Bioconductor releases there's a potential for dependency problems, but it doesn't seem like a great concern in this case. Especially since it's supplementing the normal testing methods.
  2. Perhaps after installing from bioconda you want to update cBioPortalData from GitHub in order to test the most current development version? ie BiocManager::install("waldronlab/cBioPortalData", update = FALSE, ask = FALSE) (avoiding any other available updates)
  3. side note, Bioconductor's package testing system supports weekly "long tests" that have a time limit of 6h instead of 40m (https://bioconductor.org/developers/how-to/long-tests/)
inodb commented 4 years ago

Thanks for the prompt review @lwaldron!

  1. Agreed. Do note that by calling installdev.sh at the end it installs the latest version of the code using devtools. So it will pull missing dependencies if they weren't in the conda release. Resolving of dependencies might be slightly different than building from source using R but like u said: maybe not a big concern since the docker image is supplemental
  2. I believe this is what installdev.sh does. I assume BiocManager will try pull the latest code from the master branch on GitHub? Since the code is already checked out in the GitHub action, we can avoid doing the extra roundtrip of pulling the repo again. Also note that when this action runs on a pull request it will use the code in that PR instead of latest master
  3. Thanks for sharing this reference! That's great!
lwaldron commented 4 years ago

I believe this is what installdev.sh does. I assume BiocManager will try pull the latest code from the master branch on GitHub?

Is this the line you're referring to?

Rscript --vanilla -e 'devtools::install(pkg=".",dependencies=FALSE)'

in which case "." refers to the currently checked-out repo, and yes you're right. BiocManager would have installed from the Bioconductor repository (devel or release) if there isn't a "repo/pkgname" style package used.

inodb commented 4 years ago

@lwaldron correct. That's the command I've been using to install a package from a local dir while developing (not sure if there's a more canonical way to do this, but that seemed to work)

I see, thanks for the clarification re BiocManager!