rostools / r-cubed-intermediate

Reproducible Research in R: An Intermediate Workshop on Modern Approaches and Workflows to Processing Data
https://r-cubed-intermediate.rostools.org/
Other
5 stars 4 forks source link

Threat to reproducible research #16

Closed scoultersdcoe closed 6 months ago

scoultersdcoe commented 11 months ago

Not an issue as much as a request for additional content. I'm curious if you might address some of the criticisms of R and how it poses a threat to reproducible research as part of the intermediate/advance workshop content. For example, post 100 from Data Colada (https://datacolada.org/100).

lwjohnst86 commented 6 months ago

Thanks for posting the issue. Sorry for the very late reply (I prioritize reviewing issues and maintaining the material once a year). I read through the post. It sounds more like a marketing post on the groundhog package (which the author has created), rather than a well-researched and well-founded examination of the actual problems of reproducibility in research. In my experience, and from what I've read, as well as the core purpose of these courses, isn't that any given tool impacts positively or negatively on reproducibility, it's that the majority of researchers are simply unaware of or recognize the importance of reproducibility at all! This is an awareness and education problem right now, not a technological problem.

Going to the main argument of the post though, R is not a threat to reproducibility, the "tool-based" threats are that most researchers use tools like Word, Excel, SPSS, poorly structured SAS or Stata code, no version control, and no code sharing tools. All of which either do not even have reproducibility available (Excel/Word), are not using tools that enhance/make reproducibility easier (Git, GitHub), or have no formal training in coding as a necessary skill and knowledge so do not write code in any sort of reproducible way.

Until more researchers use open source, code-based tools like R and Python as well as get proper training on how to use them effectively, we don't have to worry about highly-specific reproducibility concerns like "package dependency/computing environment management".