r-lib / pak

A fresh approach to package installation
https://pak.r-lib.org
684 stars 62 forks source link

[Feature Request] Automatic installation of packages detected via explicit package::function() calls in quarto and R projects #712

Open fretwurst opened 2 weeks ago

fretwurst commented 2 weeks ago

Problem

In CI/CD-Workflows, particularly for Quarto-based projects, missing package installations often cause rendering processes to fail. This is a common issue when:

For Quarto projects, active chapters defined in _quarto.yml (under chapters or render) often determine the relevant .qmd files to render. Detecting and pre-installing the packages used in these files before rendering could significantly streamline the CI/CD workflow.


Proposed Solution

Enhance pak with a feature to:

  1. Scan project files (e.g., .qmd, .Rmd, .R) for all explicitly used packages:
    • Detect package::function() calls.
    • Optionally scan for pak::pkg_install() or pak::pkg() calls in file headers.
  2. Support Quarto project workflows:
    • Read _quarto.yml to identify active .qmd files (chapters or render keys).
    • Install all required packages before rendering begins.
  3. Install missing packages efficiently:
    • Use pak's parallelized installation and caching to minimize installation time in CI/CD pipelines.
    • Avoid breaking on the first missing package.

Best Practice Alignment

Modern R style guides, such as the Google R Style Guide and the RStudio Tidyverse Style Guide, recommend using explicit package::function() calls over loading packages globally. This approach improves:

Given this trend, tools like pak should support workflows where packages are explicitly referenced, especially in CI/CD contexts where no preloaded environment exists.

#### Example Workflow
A new `pak` function, such as `pak::install_quarto_deps()`, could streamline this process:

```r
# Automatically scan a Quarto project and install dependencies
pak::install_quarto_deps(yml = "_quarto.yml")

This function would:

Alternatively, a more general function like pak::scan_and_install() could be used for non-Quarto workflows:

# Scan an arbitrary folder for used packages and install them
pak::scan_and_install(path = ".", pattern = "\\.qmd$")

Benefits

  1. Streamlined CI/CD Pipelines:
    Avoid pipeline failures due to missing packages by ensuring all dependencies are installed in advance.

  2. Efficiency for Large Projects:
    Automatically handle dependency management for Quarto projects with multiple .qmd files and dynamic dependencies.

  3. Modern Style Alignment:
    Supports best practices by enabling workflows where package::function() is preferred over global package loading.

  4. Broader Use Case:
    While the focus is on Quarto projects, this feature could benefit RMarkdown users or anyone working with R scripts in CI/CD environments.

  5. Optimized for Docker:
    By leveraging pak’s caching and parallelized installation, it minimizes time and resources in containerized environments.

gaborcsardi commented 2 weeks ago

This is already happening here: https://github.com/r-lib/pkgdepends/issues/390