pharmaR / regulatory-r-repo-wg

Package consensus for regulated industries
https://pharmar.github.io/regulatory-r-repo-wg
27 stars 3 forks source link

Which packages should be included? #36

Open dgkf opened 1 year ago

dgkf commented 1 year ago

Discussed in https://github.com/pharmaR/regulatory-r-repo-wg/discussions/14

Migrated from discussion in #20


Originally posted by **dgkf** December 6, 2022 A very central question raised by @matthiazzz at today's meeting. I'll do my best to paraphrase, but please feel free to weigh in if I misinterpreted anything. The crux of the question as I understand it is how broad of a remit we want for the regulatory R repository. Should we be assessing packages that are clearly of good quality and for rather generic use cases (eg `dplyr`), or will that draw our focus away from packages that might be more critical for statistical decision making?
Originally posted by @dgkf in https://github.com/pharmaR/regulatory-r-repo-wg/discussions/14#discussioncomment-4325025 I tend to approach these questions very generically - largely because I think the question of defining some industry standard quality threshold needed to bypass a process can be a more challenging task than just assessing quality across all packages. I would prefer if _all_ necessary packages were assessed including dependencies and more basic functionality. From there, we can apply filtering criteria can be used post-hoc to subset packages as desired (https://github.com/pharmaR/regulatory-r-repo-wg/discussions/3). At that point, pulling all packages that meet some monthly download threshold as well as those which meet a quality criteria is a matter of post-hoc filtering, not up front gatekeeping. Depending on what we see as our role in curating packages, this might be prohibitive. I'm curious to hear what level of curation others expect to be necessary in such a repository.
Originally posted by @pedrobtz in https://github.com/pharmaR/regulatory-r-repo-wg/discussions/14#discussioncomment-4496738 A few comments and questions: 1. What is the (initial) universe of packages sources to be consider: CRAN + BIOC + GitHub? 2. Is **Certification** a qualitative (certified vs not certified) measure or quantitative? 3. What affect does package dependency have on certification of individual packages? 4. Is the goal to flag all "good" packages, or to flag potential "bad" (= unproven good) packages? 5. What information is missing currently from CRAN/BIOC to support human risk-based decision approach?
Originally posted by @pedrobtz in https://github.com/pharmaR/regulatory-r-repo-wg/discussions/14#discussioncomment-4496837 The [rOpenSci Statistical Software Peer Review](https://stats-devguide.ropensci.org) initiative goes in the direction of flagging "good" packages with a ranking Gold,Silver, Bronze. Probably, there is limit capacity to scale this review process to a big universe of packages. Either one reduces the scope (the number of packages) or tries to automate the review classification process. - https://www.r-bloggers.com/2022/11/our-first-peer-reviewed-statistical-r-packages/