stan-dev / posteriordb

Database with posteriors of interest for Bayesian inference
176 stars 36 forks source link

Cache R dependencies in CI #146

Closed eerolinna closed 4 years ago

eerolinna commented 4 years ago

Currently R builds in CI are taking a long time, see for example https://travis-ci.org/MansMeg/posteriordb/builds/645435512 where the R build took 17 minutes. It seems that most of the time is spent in installing libraries for R. I'm including some potentially relevant information below (in an essentially logbook form of what I found when investigating the problem, so probably many things won't be relevant in the end).

Here's documentation on caching in travis https://docs.travis-ci.com/user/caching/ from a quick read it seems like something like this might speed up the R builds a lot

 cache:
    directories:
     - /home/travis/R/Library/

However it might be that the travis R settings are doing something like this already, because the build log contains these lines

Setting up package cache
adding /home/travis/R/Library to cache
creating directory /home/travis/R/Library

so probably that won't be the fix we need. The build log is from https://travis-ci.org/MansMeg/posteriordb/jobs/645435516

The travis documentation states:

Travis CI uploads the cache after the script phase of the build, but before either after_success or after_failure

Perhaps this means that the cache is not uploaded in time

I also found something else in the build log

Setting up build cache
$ export CASHER_DIR=${TRAVIS_HOME}/.casher
Installing caching utilities
attempting to download cache archive
fetching PR.145/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--R-3.6.2.tgz
fetching PR.145/cache-linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--R-3.6.2.tgz
fetching PR.145/cache--R-3.6.2.tgz
fetching development/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--R-3.6.2.tgz
fetching development/cache-linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--R-3.6.2.tgz
fetching development/cache--R-3.6.2.tgz
fetching master/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--R-3.6.2.tgz
fetching master/cache-linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--R-3.6.2.tgz
fetching master/cache--R-3.6.2.tgz
could not download cache

This suggests that perhaps the cache is being uploaded correctly but the download is the part where the failure occurs. Or perhaps it means that no cache exists, which would explain why it cannot be loaded.

The build log also contains

store build cache
changes detected, packing new archive
uploading PR.145/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--R-3.6.2.tgz
cache uploaded

So it seems that at least for this build the cache was uploaded successfully. Perhaps it means that the reason why the cache was missing is that the development branch has not had successful CI jobs in a long time. I might test that hypothesis soon.

In short, currently the R libraries are not being cached, which slows down the builds by

eerolinna commented 4 years ago

Somewhat unrelated, but while investigating I came across this

If your package depends on development packages that are not on CRAN (such as GitHub) we recommend you use the Remotes: annotation in your package DESCRPITION file. This will allow your package and dependencies to be easily installed by devtools::install_github() as well as on Travis (Examples). It is generally no longer necessary to use r_github_packages, r_packages, r_binary_packages, etc. as this can be handled with Remotes.

from https://blog.rstudio.com/2016/03/09/r-on-travis-ci/

Currently .travis.yml contains

      r_packages: covr
      r_github_packages: jgabry/posterior

Is it however that these packages should not be actual dependencies of posteriordb? As in should not be installed by default when posteriordb is installed. I would guess this is the case for covr (so r_packages should be kept in .travis.yml) but I'm not sure about posterior.

I'm suspecting that the one possible cause for the slow builds is that covr might not be cached. However there are other possibilities too.

eerolinna commented 4 years ago

Looking at the latest build on master https://travis-ci.org/MansMeg/posteriordb/jobs/619031110, it seems that there a cache was found. It was on R 3.6.1, which might explain why the development branch doesn't find any cache as it is looking for R 3.6.2. covr was still installed during the build, which suggests that r_packages are not being included in the cache currently. However the build lasted only 3 minutes instead of 17.

My current guess is that if we can get the development branch to build successfully once (thus creating the cache for R 3.6.2) the build time will drop down to roughly 3 minutes. In addition if we can cache covr the build time should drop even lower.

The current development build should pass, but it seems like the job sometimes timeouts. That happened once already, I restarted the build to see what happens next time. I might also try restarting the build for master, which if successful would provide the cache for development.

eerolinna commented 4 years ago

Ah, development is failing because the R tests are failing. This means that a new cache is not being pushed as that is only done on successful builds.


> library(testthat)
> library(posteriordb)
> 
> test_check("posteriordb")
1. Error: Filter posteriors locally and on github (@test-filter-tibble.R#13) 
Assertion on 'names(x)' failed: Must be a subset of set {name,model_implementations,title,prior,added_by,added_date,references,description,urls,keywords}.
Backtrace:
  1. testthat::expect_message(...)
  6. posteriordb::filter_posteriors(...)
  7. posteriordb:::pdb_filter(path = "posteriors", pdb = pdb, ...)
  8. posteriordb::posterior(name = nms[i], pdb = pdb)
  9. posteriordb:::read_model_info(po)
 10. posteriordb:::assert_model_info(model_info)
 11. checkmate::assert_names(...)
 12. checkmate::makeAssertion(x, res, .var.name, add)
 13. checkmate:::mstop(...)

2. Error: posteriordb:::check_pdb indicates that github PDB is ok (@test-pdb_
Assertion on 'names(x)' failed: Must be a subset of set {name,model_implementations,title,prior,added_by,added_date,references,description,urls,keywords}.
Backtrace:
  1. testthat::expect_message(...)
  6. posteriordb:::check_pdb(pdb = pdb_github_test, posterior_idx = 1)
  7. posteriordb::posterior(name = pns[i], pdb = pdb)
  8. posteriordb:::read_model_info(po)
  9. posteriordb:::assert_model_info(model_info)
 10. checkmate::assert_names(...)
 11. checkmate::makeAssertion(x, res, .var.name, add)
 12. checkmate:::mstop(...)

══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 91 | SKIPPED: 12 | WARNINGS: 0 | FAILED: 2 ]
1. Error: Filter posteriors locally and on github (@test-filter-tibble.R#13) 
2. Error: posteriordb:::check_pdb indicates that github PDB is ok (@test-pdb_github.R#9) 

Error: testthat unit tests failed
Execution halted

I started a build on master, lets see if further development builds can pick up the cache from master (assuming the master build succeeds)

eerolinna commented 4 years ago

Master built the cache for R 3.6.2 successfully and new development builds are using it, reducing build time from ~20 minutes to 2 minutes.

It seems that now only covr and remotes are installed during build time, other packages are found from the cache.

I would say that this issue is now partially resolved. It would be nice to reduce the build time even more but 2-3 minutes is allright too.

MansMeg commented 4 years ago

Yes. I agree. It sometimes will run 14 minutes, but that is when there is a new R release it need to install in the cache.