Open benmarwick opened 9 years ago
Not surprisingly I'm also interested in this. To add the list:
pomp
is to this recent paper & it's R script supplement: http://doi.org/10.1073/pnas.1410597112 ). zenodo.json
, https://github.com/mbjones/codemeta maybe)make
in this picture? maker
? (Trying to draw @richfitz in here to set me right)includes=FALSE
. (e.g. methods sections may come after figures). csl
and cls
files, possibly the packrat files, output pdf and tex formats etc. This is another reason I haven't found it practical to treat a real manuscript as a vignette. Perhaps this is already solved but the answer isn't obvious to me. Certainly rmarkdown
, rticles
etc have made it a bit easier but this still all tends to look a lot cleaner in toy examples than in my real life. :+1: seems like a good idea, possibly also linking with #6 as (a) what is a manuscript if not an artefact of research? and (b) how do you store outputs with the compendium?
I won't be at that unconf (will be following along remotely), but wanted to add my support to the idea of discussing packages as research compendium. Couple of observations:
I may sound like I am down on the idea of using packages, but I am not. I have found a lot of benefit in using the package format and specifically using the vignette as manuscript. I will use the model again and anything that comes out of this discussion would be great to include for my next manuscript.
I think there is some good discussion to be had as to whether the goal of the reproducibility charge is the end-to-end publication target (including the issues pointed out above w/ citation management) or the generation of publication components that are data/code/methods related. This is a topic that I am very interested in, and some things that we have been working on are more geared towards including R packaging (or something of the like) in larger collaborations as both the analytical tools and the product component (figs/tables) building as part of a project-level CI. I think there is much to be done in terms of steering the research process towards reproducibility, and it is going to become more important as data/questions increase in complexity and the teams continue to grow and diversify.
@cboettig the dynamic documents vs scientific narratives is a tough one. I did some work on non-linear dynamic documents, where a narrative is a path through the graph of document elements for my thesis. See, eg https://github.com/gmbecker/DynDocModel (hoping to find time to bump this back up to the /back/ burner ...). Things get very complicated very quickly, though.
Something akin to the Vistrails approach http://www.vistrails.org/index.php/Main_Page#Publishing_Reproducible_Results , with a database of code and artifacts that a dynamic/"live" paper pulls from/recomputes at compile or view time might be more useful in practice. At least in the short term.
A modification of Gavish and Donoho's proposed VCRs http://www.sciencedirect.com/science/article/pii/S1877050911001256 is another possiblity, though AFAIR they they call only for verification, not dynamic reproduction.
Have added the method bundle_repo to git2r
that might be useful in this context. It clones the package repository as a bare repo to inst/pkg.git
so that when the package is installed the repo can be accessed with repo <- repository(system.file("pkg.git", package = "pkg"))
. I'm also planning to add the argument session
(FALSE/TRUE) to the commit
and tag
methods to append the sessionInfo
to the commit/tag message.
One suggestion for tracking provenance from @metamattj is the recordr package https://github.com/NCEAS/recordr/
To follow up a bit on this, one of the outcomes of the 2015 unconf discussion was this essay:
https://github.com/ropensci/rrrpkg
And we expanded that into this pre-print:
https://peerj.com/preprints/3192/
Which will shortly appear in The American Statistician in a collection of papers on 'Practical Data Science for Stats'
Awesome! And thanks for posting the follow up here.
And just to add an idea for more work :) would you be interested in a blog post on this to cross-post on the rOpenSci and Software/Data Carpentry blog, or just put on one? I imagine @stefaniebutland on the rOpenSci and @weaverbel on SWC/DC could help.
@tracykteal Is this post on an unconf17 project relevant here? Tackling the Research Compendium at runconf17 https://ropensci.org/blog/blog/2017/06/20/checkers
At last year's rOpenSci event we worked on a short guide to reproducible research, under @iamciera's guidance. Some of the most interesting progress on this topic since then has been on using the R package framework as a research repository or compendium for scholarly work, cf. @rmflight's blog posts, @cboettig's template package, @pakillo's template package and @jhollist's manuscriptPackage, etc.
The concept of a research compendium has been around for a while (cf. Gentleman 2005, Gentleman & Temple Lang 2007, Stodden 2009, Leisch et al. 2011). Many of us are making custom R packages to accompany our research publications to improve reproducibility, but I think there are a bunch of questions are what are the best ways to do this.
Perhaps at the unconf we can have a discussion to share some of the ways we're using R packages as research compendia, and draft a few guidelines to add to the guide. The goal would be to help domain scientists, especially those who are primarily not tool-developers and already prolific package authors, get started with this. @hadley's book is of course an excellent resource on R packages generally, but using packages as research compendia raises some specialised questions that this ropensci group are uniquely qualified to tackle.
Some of the questions that I'd like to learn more about on this topic include:
manuscript
directory in the package, which is outside of the regular package framework and needsmake
to execute.