hubverse-org / hubEnsemblesManuscript

https://htmlpreview.github.io/?https://github.com/Infectious-Disease-Modeling-Hubs/hubEnsemblesManuscript/blob/master/analysis/paper/hubEnsembles_manuscript.html
Other
1 stars 2 forks source link

Should any package dependencies for the manuscript be development versions #86

Open lshandross opened 2 weeks ago

lshandross commented 2 weeks ago

We decided to make the manuscript repository a research compendium using rrtools based on some suggestions. From what I understand, this means that our code is stored as an R package, and thus we have a typical DESCRIPTION file with dependencies.

However, since this is a repo for a manuscript and will not be updated/maintained, it made me wonder if it would be better practice for packages on CRAN to not have their GitHub repositories listed under the Remotes field. (It's also possible that this doesn't matter, especially if we specify the exact versions of all packages used by the manuscript).

What are your thoughts, @zkamvar?

zkamvar commented 2 weeks ago

I fully support the approach of rrtools! It looks like the repository (as of 2de12ccf20783ff4933e16abd767d34b7e6b00a1) at the stage where it has a DESCRIPTION file, so you are all set so people can run pak::pak("hubverse-org/hubEnsemblesManuscript") and have all the tools they need to build your manuscript.

The problem is that R's philosophy behind package management is: "update everything all the time"... which doesn't quite work with research compendia. Pinning the versions in the DESCRIPTION file is kind of a waste of time when you can't depend on the dependencies also being pinned.

IMO, the most effective thing you can do is to pin the packages using renv::init(). You can also create a Dockerfile with rrtools::use_dockerfile() (though I find that to be overkill).

With a renv project setup, the sources for your packages are recorded so when things need to be restored, they can be... you can even have Docker use the renv project to provision the image.

zkamvar commented 2 weeks ago

Regarding your main question: having as many packages be the CRAN release versions is the best option. If a package is not on CRAN, be sure to have the remote added. For the hub verse packages, you can use the version tag on the remote (e.g. hubverse-org/hubData@v1.2.2)

The renv lockfile will contain the correct sources for all the packages which can be retrieved from either the CRAN archives or GitHub (note that even if you have packages installed from the R-universe, they contain metadata about what commit they come from on GitHub).

lshandross commented 2 weeks ago

Thanks for the explanation. We had previously tried using renv but found it created a lot of errors when rendering the manuscript and ended up deciding to remove it (discussion here). So maybe I'll go the docker route instead.

zkamvar commented 2 weeks ago

Ah yes, the bear of active development inside of a {renv} project with multiple collaborators.

A Docker container is a great solution---keep in mind that it's a good idea to host the image somewhere like docker hub (and maybe zenodo or osf) in a versioned state so that it can be reproduced (as new builds of the images would likely bring in new versions of the software).

I would strongly recommend to record a renv.lock file with after you have successfully rendered the manuscript so that the dependency versions are properly recorded somewhere where they can be parsed (you can use https://github.com/MilesMcBain/capsule to do so in a way that doesn't force your system to automatically use renv).

If you don't want to do that, I would recommend adding a section at the bottom of the readme that includes a call to sessioninfo::session_info() (e.g. https://github.com/everhartlab/sclerotinia-366/tree/master?tab=readme-ov-file#packages-used) so that the package versions are recorded in text.