Open dgkf-roche opened 2 weeks ago
Hey @gaborcsardi, is this something that you'd be interested in supporting? On our end, the filtering feature of available.packages
, and its ubiquity across most mechanisms of interfacing with repositories, is a core feature of our repo tools.
Yes, I would like to have a way to prioritize repositories, but it would be another way, as we don't use available.packages()
.
What kind of filters do you use in available_packages_filters
?
This is related to our work in our (currently private) fork of r-lib/rhub
re-purposed for regulated industries. As packages are updated, we calculate a number of quantifiable indicators of the package's quality. We embed these indicators inside the PACKAGES
file with the hope of allowing the end-user to specify some quality selection criteria. We've piloted using the available_packages_filters
option as a universal mechanism of applying a policy.
There's a brief demo in the README
of this package
We use some helper functions in the demo to simplify the syntax, but it amounts to doing something like:
options(available_packages_filters = list(add = TRUE, function(ap) {
dplyr::as_tibble(ap) |>
dplyr::select(
QualityLineCoverage >= 0.5,
QualityExportCoverage >= 0.9,
QualityExportDocumentationCoverage >= 0.9
)
}))
Here the logic is just a series of conditions, but we'd like to keep it arbitrary - it could be a decision tree or some aggregation of different qualities.
The ability to provide a function that can arbitrarily filter the available packages pulled from repos in options(repos)
is pretty core to our design and our hope is that this can be applied by an administrator, ensuring that all well-intentioned user-facing mechanisms of installing packages apply the filtering criteria.
Speaking only for my company, we also use this behavior to force R to prioritize repositories by their order in options(repos)
. I've informally chatted with folks from other companies that mentioned they had to enforce this policy as well, so I think it's a rather frequent pitfall that needs to be addressed when locking down systems.
So you basically want to be able to specify arbitrary conditions on arbitrary fields from your package metadata. This is certainly possible, but needs quite a lot of changes, as currently we don't even read in all metadata from PACKAGES*
files.
So you basically want to be able to specify arbitrary conditions on arbitrary fields from your package metadata
Yes, exactly. Glad to hear you're open to supporting it - please let us know if there's anything we can take on to help support it.
From what I saw in the PACKAGES
parsing, it looked like it supported up to ~1000 fields which should be plenty for our needs. Are there constraints on the field names? We haven't set any standard yet, so we can definitely consider a convention that makes your life easier.
Not sure if this is necessarily a bug - maybe it's an intentional omission.
We were hoping to leverage this as a universal mechanism for applying a selection criteria to a repository of packages based on quality measures over in pharmaR/pharmapkgs.
Using a simple example, I tried to make a function that ideally would only permit (at least without some intentional side-stepping of common install tools) installation of packages that start with "c".
Though I would expect this to fail, given that the filter should prevent these packages from being available.
Substituting with a
function(ap) browser()
function also never hits a debug session, so my impression is thatavailable.packages
is either used internally but with some defaultfilters
, or an alternative mechanism is used that doesn't implement this behavior.I'm curious to hear your thoughts. It would be a tremendously valuable feature for us.