Closed dgkf closed 4 months ago
I don't understand your proposal, why it is better to provide a function that generates the filter and the call to available.packages
vs just wrapping available.packages
and do it once?
Wouldn't just be better wrapping available.packages
to provide the fields of interest for this repo?
Note that there is the special filter add = TRUE
to add the new filters to existing ones (the default or others).
Simplifying the code from https://github.com/pharmaR/regulatory-r-repo-wg/issues/88:
available_packages <- function(pharmapkgs = <folder/URL>, custom_filters = <custom_filters>, ...) {
available.packages(
repos = c(getOption("repo"), pharmapkgs),
fields = risk_fields(),
filters = list(add = TRUE, custom_filters),
...)
}
The benefit of an available.packages
solution isn't strictly to produce the listing of available packages. The primary benefit is the integration with other tools (install.packages
, remotes::install_version
, pak::pkg_install
, renv::install
). For all of these, we need to hook into how available.packages
works to limit availability during install.
I don't understand your proposal, why it is better to provide a function that generates the filter and the call to available.packages vs just wrapping available.packages and do it once?
Yeah - admittedly it's a weird design. In order to both hook into installation tools and also provide a filter that makes use of risk scores we somehow would like to both use the available_packages_filters
option, but also have access to our custom risk fields, even if they're not provided to the call to available.packages
. This is where all the complexity of this proposal stems from.
I was hoping to do a little proof-of-concept today. Would you mind if I tagged you for review? I think seeing it in code might communicate the goal better than I can in an issue.
Note that there is the special filter add = TRUE to add the new filters to existing ones (the default or others).
Thanks, yes! I just saw this yesterday when reading through these docs. We should definitely make use of this.
The benefit of an
available.packages
solution isn't strictly to produce the listing of available packages. The primary benefit is the integration with other tools (install.packages
,remotes::install_version
,pak::pkg_install
,renv::install
). For all of these, we need to hook into howavailable.packages
works to limit availability during install.
I thought the whole point of this package was to interact with the pharma repository, I wouldn't attempt to keep up with a third party tool (at least not initially for a proof of concept).
A similar approach was taken by Bioconductor with BiocManager::available
.
It has its own package and function to check which packages are available and to install them.
Even if it uses other packages, it is all wrapped by their repository package manager BiocManager
(and it doesn't work well with renv
, and renv
doesn't work well with Bioconductor).
I don't understand your proposal, why it is better to provide a function that generates the filter and the call to available.packages vs just wrapping available.packages and do it once?
Yeah - admittedly it's a weird design. In order to both hook into installation tools and also provide a filter that makes use of risk scores we somehow would like to both use the
available_packages_filters
option, but also have access to our custom risk fields, even if they're not provided to the call toavailable.packages
. This is where all the complexity of this proposal stems from.I was hoping to do a little proof-of-concept today. Would you mind if I tagged you for review? I think seeing it in code might communicate the goal better than I can in an issue.
Yes, please tag me.
I thought the whole point of this package was to interact with the pharma repository ... A similar approach was taken by Bioconductor with
BiocManager::available
. It has its own package and function to check which packages are available and to install them.
Ah, I see where you're coming from now. Yeah, I think I had something different in mind - which isn't to say I was right, just that I was thinking about the problem differently.
I hadn't considered a workflow where the intention is to work solely with this repository, as one might with BioConductor - and maybe this is a blindspot of mine. One requirement we definitely have is the ability to intermix a public regulatory package repository with privately hosted enterprise package repositories (proprietary in-house packages). With that in mind, having BioConductor-style tooling might pose a barrier.
That said, it would definitely give us more control over how package installations may happen.
I'll share a POC of what this might look like shortly.
I still haven't reviewed the PR, but I was reading renv documentation and I think I found something relevant for this issue and wanted to document that for renv compatibility there is the dependencies
argument of renv::install
:
A vector of DESCRIPTION field names that should be used for package dependency resolution. When NULL (the default), the value of renv::settings$package.dependency.fields is used. The aliases "strong", "most", and "all" are also supported
We use
options("available_packages_filters" = repo_filter(...))
to set a special function to filter the repository package down to only a subset with acceptable risk.This function operates on the matrix returned by
available.packages
to filter packages. By default,available.packages
will only return a set of canonical fields from thePACKAGES
file. However, as we've injected our own risk fields, we want to include those as well so that we can use them for filtering.Currently, we use a small hack to change the default value of the
available.packages()
fields
parameter. This means that any timeavailable.packages()
is used (for example, internally withinpak
), these fields are included and our risk filters can be applied.However, it is not good R etiquette to modify base functions like this. Instead,
risk_filter
may generate a function like this (in pseudocode):It's worth doing some exploration, because I'm not totally sure which path would be easier here. We might need to do a little re-creating of the
available.packages
filter scheme, which is found in these lines.