ropensci / unconf17

Website for 2017 rOpenSci Unconf
http://unconf17.ropensci.org
64 stars 12 forks source link

Tools for scheduled, automatic testing of packages that access open data APIs #25

Open geanders opened 7 years ago

geanders commented 7 years ago

While CRAN regularly tests examples and vignette code in packages posted there, if a package is pulling large datasets from an open data API, these tests may be set to not run. This is nice for not overtaxing CRAN resources, but not great for making sure something hasn't broken in one of these packages once it's up on CRAN. Travis CI and other tools offer a really nice way to run these tests while the package is being actively developed, but I think typically these tools only check the build when a commit is pushed. Travis CI now has Cron Jobs in beta (https://docs.travis-ci.com/user/cron-jobs/)-- I'd be interested in figuring out how to set up scheduled automatic testing of a package through Cron Jobs (or something similar) and, if possible, creating a function to add any necessary infrastructure to a package to initialize scheduled testing once the package is submitted to CRAN (something like use_cron_jobs). Ideally, I'm thinking this might be a way to keep some of the burden off CRAN (for a package downloading lots of data) but still quickly realize if a package you're maintaining is having problems, either as a result of changes in the API you're working with or from changes in package dependencies.

sckott commented 7 years ago

@geanders For some ropensci packages we do a custom Travis restarts app on Heroku - code here https://github.com/ropensci/travis-restarts - here's the repos https://github.com/ropensci/travis-restarts/blob/master/repos.rb we currently do restarts on each day

But since we're not paying for Travis - we don't get many concurrent builds - thus the builds back up - and since we have contributors in many different time zones - they may want to check their builds - but if our "restarts" are running then they'd have to wait a while

important packages to include are indeed ones you're talking about, packages that interact with web APIs, or web data of some sort, that can change and thereby break the package.

you're welcome to suggest to add any packages to travis-restarts

noamross commented 7 years ago

FWIW: I use both AppVeyor and Travis cron options for a package that interacts with an API (package open, API private: https://github.com/ecohealthalliance/eidith/). I find that I get a lot of false errors from Travis cron builds erroring out on either API calls or initial setup. Perhaps cron jobs just have lower priority for resources and are more likely to have network errors? It seems these errors are less likely on AppVeyor, which lets your customize the timing/frequency of cron jobs.

geanders commented 7 years ago

@sckott Thanks for the links-- these look very interesting and exactly in line with what I was thinking. I'm looking forward to playing around with this some. Even weekly (rather than daily) tests of builds would be great.

geanders commented 7 years ago

Hmmm... I've never played around much with running scheduled timed jobs from my local computer, but could that be an option (I mean for individual package maintainers, not for bigger projects with lots of packages like ROpenSci), rather than running through Travis or AppVeyor (and thanks for the link to the code for those for your package, @noamross)? Could a package maintainer just set up a script to build and check a package on his or her own computer once a week or so?

sckott commented 7 years ago

@geanders cool. that's not to say don't make something new/different - just laying out similar things

fmichonneau commented 7 years ago

I have been thinking of working on a package that does this. I don't want to rely on cron or task scheduler, but instead have something like this in my Rprofile:

run_weekly(devtools::check("my_awesome_pkg"))
run_weekly(upgrade_all_packages())

and the code would be executed automatically. However, I'd also like to have this done in the background so it doesn't slow down the start of my R session. I haven't had a chance to investigate how to do this part yet...

geanders commented 7 years ago

@sckott Absolutely. And I think the approach you laid out would be really great, so it might be more of a question of whether there's a way to write a function or two that lowers the threshold for how hard it is for someone to set up the infrastructure you've got going, but for their own projects (without explicitly adding a package to travis-restarts-- I'm assuming that that's more for ROpenSci packages, rather than something that anyone could add something to? Or am I wrong about that?). Something along the lines of use_travis in devtools. I need to look through what you and Noam used for your set-ups, though, to get a better idea of whether a lot of the set-up could be moved into a script that could be run within a function.

geanders commented 7 years ago

@fmichonneau Yes, something like run_weekly would be very cool.

sckott commented 7 years ago

@geanders

  1. the restarts approach is esp. good for organizations with lots of repos - so prob. not ideal for just a few repos - that repo is for ropensci repos - but anyone could copy it
  2. we could make it easier e.g, with https://devcenter.heroku.com/articles/heroku-button - a button on the readme of the repo to deploy your own - haven't looked into it yet though.