rbind / b-rodrigues.github.com

Blog
5 stars 34 forks source link

Discussion about reproducibility #5

Open b-rodrigues opened 1 year ago

mccarthy-m-g commented 1 year ago

I’m working on a reproducible data science series too. I don’t think anything there will be surprising to you, but figured I’d link it. It’s under the “series” listing if you look at the deploy preview.


One thing you didn’t bring up is r-lib’s rig, which is a cross-platform installation manager for R. For projects without a containerized OS packaging everything you need, it’s a nice way to install and switch to the correct R version (it’s also just a nice tool in general).


I think the idea of reproducibility being on a continuum (and maybe having a shelf-life) is worth diving into more. I wrote about it briefly in my series too:

Shades of reproducibility

The basic idea behind a reproducible data product is that the steps, processes, and procedures that went into making it can be repeated exactly by yourself and others, resulting in the exact same outcome every time. Ideally, there is no expiration date for reproducibility—the reproducibility of a data product could be tested tomorrow or in ten decades and should give the exact same outcome both times. If the outcomes were different, the data product is no longer reproducible (and perhaps we should no longer trust the original results). Realistically, it might be okay if a data product stops being reproducible, so long as this change happens after the data product has outlived its purpose.

And just recently found Posit discussing similar ideas here.

b-rodrigues commented 1 year ago

Many thanks for your comment, I will take a look at the resources you posted!