ropensci / roregistry

ropensci registry
13 stars 5 forks source link

rOpenSci Package Registry

What is this

This repository contains 2 files that define the official rOpenSci package suite:

The rOpenSci package suite consists of all R packages in the ropensci and ropenscilabs GitHub organizations, except for packages listed in exclude list, plus some extra packages listed in not_transferred.json.

The CI automatically updates the packages.json and registry.json files using the makeregistry package.

Generating packages.json

The code to re-generate packages.json and registry.json is in the makeregistry package. The build_ropensci_packages_json() function works as follows:

  1. It queries the GitHub API for all repositories in ropensci and ropenscilabs.
  2. It removes entries from the exclude list
  3. It adds packages listed in not_transferred.json
  4. Saves the final list in packages.json

This function should take less then a minute to complete, be very reliable, and we run it frequently.

On a daily basis we also try to collect metadata from all ropensci packages, using make_registry() function. This function uses the following steps:

This second function can run up to 10 minutes and requires many API calls (multiple per package). It is not very robust and sometimes fails for a number of random reasons.

Why the CI runs in a container

To speed up the CI builds, the roregistry workflow runs in a docker container which has R and makeregistry preinstalled. This container is automatically built and published on GHCR using this workflow.

When a change is committed to makeregistry, it takes a few minutes before the container is updated. This is exactly the time we save for each CI run in roregistry because it does not have to install R and makeregistry + dependencies for each build.

Getting the registry

To get just the raw JSON of the registry, go to https://ropensci.github.io/roregistry/registry.json

To read in from R with jsonlite:

url <- "https://ropensci.github.io/roregistry/registry.json"
z <- jsonlite::fromJSON(url)
tibble::as_tibble(z$packages)
#> # A tibble: 388 x 13
#>    name  description details maintainer keywords github status onboarding on_cran on_bioc url   ropensci_catego…
#>    <chr> <chr>       <chr>   <chr>      <chr>    <chr>  <chr>  <chr>      <lgl>   <lgl>   <chr> <chr>
#>  1 auk   eBird Data… "Extra… Matthew S… "datase… https… active "https://… TRUE    FALSE   http… data-access
#>  2 tree… Base Class… "'tree… Guangchua… "export… https… active "https://… FALSE   TRUE    http… data-tools
#>  3 apip… Package Ge… "Packa… Scott Cha… "yaml"   https… wip    ""         FALSE   FALSE   http… http-tools
#>  4 arre… Arrested D… "Here … Lucy D'Ag… "unconf… https… conce… ""         FALSE   FALSE   http… data-access
#>  5 aspa… Client for… "Clien… Scott Cha… "archiv… https… conce… ""         FALSE   FALSE   http… literature
#>  6 astr  Decompose … "Decom… Scott Cha… ""       https… conce… ""         FALSE   FALSE   http… NA
#>  7 bind… Create req… "Compu… Saras Win… "ozunco… https… conce… ""         FALSE   FALSE   http… NA
#>  8 blog… Helps Edit… "More … Maëlle Sa… ""       https… wip    ""         FALSE   FALSE   http… scalereprod
#>  9 cche… Client for… "Clien… Scott Cha… "cran, … https… conce… ""         FALSE   FALSE   http… scalereprod
#> 10 chan… A simple i… "This … Nick Gold… "ozunco… https… conce… ""         FALSE   FALSE   http… scalereprod
#> # … with 378 more rows, and 1 more variable: date_last_commit <chr>