In CI/CD-Workflows, particularly for Quarto-based projects, missing package installations often cause rendering processes to fail. This is a common issue when:
Packages are used explicitly via package::function() in .qmd files and are not pre-installed in the CI/CD environment.
CI/CD pipelines (e.g., with Rocker Docker containers) halt after encountering the first missing package, leading to time-consuming debugging in large projects.
For Quarto projects, active chapters defined in _quarto.yml (under chapters or render) often determine the relevant .qmd files to render. Detecting and pre-installing the packages used in these files before rendering could significantly streamline the CI/CD workflow.
Proposed Solution
Enhance pak with a feature to:
Scan project files (e.g., .qmd, .Rmd, .R) for all explicitly used packages:
Detect package::function() calls.
Optionally scan for pak::pkg_install() or pak::pkg() calls in file headers.
Support Quarto project workflows:
Read _quarto.yml to identify active .qmd files (chapters or render keys).
Install all required packages before rendering begins.
Install missing packages efficiently:
Use pak's parallelized installation and caching to minimize installation time in CI/CD pipelines.
Avoid breaking on the first missing package.
Best Practice Alignment
Modern R style guides, such as the Google R Style Guide and the RStudio Tidyverse Style Guide, recommend using explicit package::function() calls over loading packages globally. This approach improves:
Clarity: The source of each function is immediately clear.
Conflict avoidance: Prevents naming conflicts between functions in different packages.
Modularity: Ensures code runs independently of preloaded packages.
Given this trend, tools like pak should support workflows where packages are explicitly referenced, especially in CI/CD contexts where no preloaded environment exists.
#### Example Workflow
A new `pak` function, such as `pak::install_quarto_deps()`, could streamline this process:
```r
# Automatically scan a Quarto project and install dependencies
pak::install_quarto_deps(yml = "_quarto.yml")
This function would:
Parse _quarto.yml to identify active .qmd files.
Extract all packages used via package::function() in these files.
Install any missing packages before rendering.
Alternatively, a more general function like pak::scan_and_install() could be used for non-Quarto workflows:
# Scan an arbitrary folder for used packages and install them
pak::scan_and_install(path = ".", pattern = "\\.qmd$")
Benefits
Streamlined CI/CD Pipelines:
Avoid pipeline failures due to missing packages by ensuring all dependencies are installed in advance.
Efficiency for Large Projects:
Automatically handle dependency management for Quarto projects with multiple .qmd files and dynamic dependencies.
Modern Style Alignment:
Supports best practices by enabling workflows where package::function() is preferred over global package loading.
Broader Use Case:
While the focus is on Quarto projects, this feature could benefit RMarkdown users or anyone working with R scripts in CI/CD environments.
Optimized for Docker:
By leveraging pak’s caching and parallelized installation, it minimizes time and resources in containerized environments.
Problem
In CI/CD-Workflows, particularly for Quarto-based projects, missing package installations often cause rendering processes to fail. This is a common issue when:
package::function()
in.qmd
files and are not pre-installed in the CI/CD environment.For Quarto projects, active chapters defined in
_quarto.yml
(underchapters
orrender
) often determine the relevant.qmd
files to render. Detecting and pre-installing the packages used in these files before rendering could significantly streamline the CI/CD workflow.Proposed Solution
Enhance
pak
with a feature to:.qmd
,.Rmd
,.R
) for all explicitly used packages:package::function()
calls.pak::pkg_install()
orpak::pkg()
calls in file headers._quarto.yml
to identify active.qmd
files (chapters
orrender
keys).pak
's parallelized installation and caching to minimize installation time in CI/CD pipelines.Best Practice Alignment
Modern R style guides, such as the Google R Style Guide and the RStudio Tidyverse Style Guide, recommend using explicit
package::function()
calls over loading packages globally. This approach improves:Given this trend, tools like
pak
should support workflows where packages are explicitly referenced, especially in CI/CD contexts where no preloaded environment exists.This function would:
_quarto.yml
to identify active.qmd
files.package::function()
in these files.Alternatively, a more general function like
pak::scan_and_install()
could be used for non-Quarto workflows:Benefits
Streamlined CI/CD Pipelines:
Avoid pipeline failures due to missing packages by ensuring all dependencies are installed in advance.
Efficiency for Large Projects:
Automatically handle dependency management for Quarto projects with multiple
.qmd
files and dynamic dependencies.Modern Style Alignment:
Supports best practices by enabling workflows where
package::function()
is preferred over global package loading.Broader Use Case:
While the focus is on Quarto projects, this feature could benefit RMarkdown users or anyone working with R scripts in CI/CD environments.
Optimized for Docker:
By leveraging
pak
’s caching and parallelized installation, it minimizes time and resources in containerized environments.