pharmaR / riskmetric

Metrics to evaluate the risk of R packages
https://pharmar.github.io/riskmetric/
Other
156 stars 29 forks source link

Chaining `pkg_ref` source types #299

Open AARON-CLARK opened 1 year ago

AARON-CLARK commented 1 year ago

Second, we have been discussing the implementation of chaining source types, so that a broader set of metrics can be scored for a package, which would allow us to provide more complete metrics, without the user having to worry about maintaining a provenance chain for the various ref sources.

Originally posted by @emilliman5 in https://github.com/pharmaR/riskmetric/issues/292#issuecomment-1582615516

Eric, please edit / comment on this issue, but I believe you mentioned leveraging a "hierarchy of source types", but I don't remember the exact structure. Somewhat related to #294 which aims to download the package tarball and add it to the ref cache. From there, pkg_ref could use pkg_remote, pkg_source, and pkg_install (in that order?) to populate as many metric assessments as possible.

emilliman5 commented 1 year ago

The goal is to maintain provenance of information regarding computing metrics. For example, if a user starts with pkg_install for ggplot2 we should not just attach pkg_remote_cran metadata since we do not know when ggplto2 was installed and thus, fo example, download stats from CRAN today may not apply to the installed package, which could have been installed year(s) ago. Hence, we implement a sort of hierarchy of pkg_ref objects.

pkg_remote (github, cran, bioc) -> pkgsource \ > pkg_install

and

pkg_source -> pkg_install

First would be to implement pkg_remote -> pkg_source. I am weary of installing package(s) in a user's environment, if only temporarily, 1) consent, 2) run time, 3) I'm not sure there will be much added info in pkg_install above pkg_remote combined with pkg_source