Feature: allow package-loading prior to lint hook

russHyde commented 2 years ago

Is your feature request related to a problem? Please describe. When running the lintr hook on a package, it flags many object_usage_linter errors. This happens because, unless the package is loaded, lintr is unable to distinguish when a referenced object is completely undefined from when it is defined within the package. Typically, we would lint the whole package in CI but use pre-commit to only lint those files in a commit set. We could turn off the object_usage_linter in the .lintr config, but that config affects both precommit and CI runs of lintr, so would mean that this useful linter would be unavailable in CI.

Describe the solution you'd like Would it be possible to add a command line option to the pre-commit linting script that determines whether the package in a repo should be loaded prior to running lintr. For example, Rscript lintr.R <files> would run as currently implemented. But Rscript lintr.R --load-package <files> would run pkgload::load_all() before running lintr on the files.

lorenzwalthert commented 2 years ago

Thanks @russHyde. What about the option to lint the whole package instead of the staged files? Is that also an option? I prefer to not use pkgload::load_all() because that requires all dependencies of the package under linting to be in the same {renv} where {precommit} is installed...

russHyde commented 2 years ago

That's interesting, I hadn't considered the environment where precommit/lintr is running. This might be something better handled on our (lintr's) end - for example, having a way to selectively deactivate linters based on environment variables, so that some linters are only active in CI.

lorenzwalthert commented 2 years ago

Is there smart caching built in in {lintr}, or is running it on all files significantly slower?

lorenzwalthert commented 2 years ago

any thoughts @MichaelChirico?

MichaelChirico commented 2 years ago

lintr is typically taking O(10 seconds) on full packages, but yes there is an expression-level caching mechanism

lorenzwalthert commented 1 year ago

having a way to selectively deactivate linters based on environment variables, so that some linters are only active in CI.

I think this sounds like introducing intransparency. Another option that we considered in #393 is to use language: script and run the hook in the global R environment. However, this also means that the check can't be run in pre-commit.ci (which is fine for that specific hook, but not for lintr hooks). Alternatively, we could have a hook running locally (and skipped on ci) and one running only on CI (but skipped locally). However, these are all workarounds for me.

lintr is typically taking O(10 seconds) on full packages, but yes there is an expression-level caching mechanism

If you cache files, would it get significantly faster? Then that would be the solution in my eyes (offloading my own work to others has always been my easy way out 😄).

lorenzwalthert commented 1 year ago

Also note that theoretically, {lintr} does not have to offer the functionality to deactivate linters based on environment variables, even if we wanted to use env variables (or pre-commit arguments in .pre-commit-config.yaml). Because in our hook script, we could also load the {lintr} config, modify it and then call {lintr} with passing the modified contents explicitly as config (I guess this should work). Or just make behaviour different for local vs cloud execution (but language is fix), but that also is not very transparent to me.

russHyde commented 1 year ago

I agree that the env-variables make it less transparent. I might experiment with writing a lintr hook that allows the user to specify a lintr config, so that I can have a .lintr.pre-commit (for use in local pre-commit) and a .lintr config (for use interactively and in CI)

lorenzwalthert commented 1 year ago

Yeah ok. Here's how I'd do it.

For local, you could also use language: script to use the global R library (so forget about additional_dependencies:). devtool::load_all() and then lint only files to be committed.
For remote, use language: r that uses another config file and runs lintr on the whole package. Since it's impossible to skip hooks locally and only run remotely IIRC, you could at the top of this script check if CI env variable is set (I believe it's set in pre-commit.ci) and skip the remainder.

joelnitta commented 1 year ago

For local, you could also use language: script to use the global R library (so forget about additional_dependencies:). devtool::load_all() and then lint only files to be committed.

@lorenzwalthert I'm not familiar enough with the setup of .pre-commit-config.yaml to figure out how to apply the solution you propose. Do you mind clarifying that a bit? Below is my current .pre-commit-config.yaml; how should I change it to use the global R library as you suggest? (I'm only concerned with doing this locally, not on remote)

``` # All available hooks: https://pre-commit.com/hooks.html # R specific hooks: https://github.com/lorenzwalthert/precommit repos: - repo: https://github.com/lorenzwalthert/precommit rev: v0.3.2.9003 hooks: - id: style-files args: [--style_pkg=styler, --style_fun=tidyverse_style] - id: roxygenize # roxygen requires loading pkg -> add dependencies from DESCRIPTION additional_dependencies: - assertr - assertthat - digest - dplyr - glue - methods - purrr - stringr - tibble - tidyr - utils - anthonynorth/roxyglobals # codemeta must be above use-tidy-description when both are used - id: codemeta-description-updated - id: use-tidy-description - id: spell-check exclude: > (?x)^( .*\.[rR]| .*\.feather| .*\.jpeg| .*\.pdf| .*\.png| .*\.py| .*\.RData| .*\.rds| .*\.Rds| .*\.Rproj| .*\.sh| (.*/|)\.gitignore| (.*/|)\.gitlab-ci\.yml| (.*/|)\.lintr| (.*/|)\.pre-commit-.*| (.*/|)\.Rbuildignore| (.*/|)\.Renviron| (.*/|)\.Rprofile| (.*/|)\.travis\.yml| (.*/|)appveyor\.yml| (.*/|)NAMESPACE| (.*/|)renv/settings\.dcf| (.*/|)renv\.lock| (.*/|)WORDLIST| \.github/workflows/.*| data/.*| )$ - id: lintr - id: readme-rmd-rendered - id: parsable-R - id: no-browser-statement - id: no-debug-statement - id: deps-in-desc - repo: https://github.com/pre-commit/pre-commit-hooks rev: v4.3.0 hooks: - id: check-added-large-files args: ['--maxkb=200'] - id: file-contents-sorter files: '^\.Rbuildignore$' - id: end-of-file-fixer exclude: > (?x)^( .*\.Rd| tests/testthat/_snaps/.*\.md| )$ - repo: https://github.com/pre-commit-ci/pre-commit-ci-config rev: v1.5.1 hooks: # Only reuiqred when https://pre-commit.ci is used for config validation - id: check-pre-commit-ci-config - repo: local hooks: - id: forbid-to-commit name: Don't commit common R artifacts entry: Cannot commit .Rhistory, .RData, .Rds or .rds. language: fail files: '\.(Rhistory|RData|Rds|rds)$' # `exclude: ` to allow committing specific files ci: autoupdate_schedule: monthly ```

lorenzwalthert commented 1 year ago

Like this:

-   repo: local
    hooks:
    -   id: local-lintr
        name: Run lintr with global R environment
        entry: Rscript -e "devtools::load_all(); output <- lintr::lint(commandArgs(trailingOnly=TRUE)); print(output); if (length(output) > 0) stop('Files not lint free')"
        language: system
        files: '(\.[rR]profile|\.R|\.Rmd|\.Rnw|\.r|\.rmd|\.rnw)$'

You could also move the R code in entry to a file and use language: script with entry: path/to/script.R where the script must be executable by the user.

lorenzwalthert commented 1 year ago

Pre-commit passes the file names to entry as unnamed arguments.

joelnitta commented 1 year ago

@lorenzwalthert thanks!

However, I noticed this only works if a single R file is being committed, since lintr::lint() only works on a single file.

Below I have modified it to work on one or more R files:

-   repo: local
    hooks:
    -   id: local-lintr
        name: Run lintr with global R environment
        entry: > 
          Rscript -e "devtools::load_all();
          output <- commandArgs(trailingOnly=TRUE) |> strsplit(' ') |>
          unlist() |> lapply(lintr::lint) |> purrr::compact();
          if (length(output) > 0) {print(output); stop('Files not lint free')}"
        language: system
        files: '(\.[rR]profile|\.R|\.Rmd|\.Rnw|\.r|\.rmd|\.rnw)$'

joelnitta commented 1 year ago

(BTW I would be interested in running this with CI also if you could provide more details on that as well)

lorenzwalthert commented 1 year ago

Thanks for your snipped. Also, you may want to print lilntr::lint() output, so in case the hook does not pass, the problems are displayed.

For CI, this does not work (system hooks are not supported with pre-commit.ci). Potentially later with GitHub Actions and #450, but for now, maybe you could just use the lintr action from r-lib/actions?

joelnitta commented 1 year ago

It does print the output (I just ran a test to verify): {print(output); stop('Files not lint free')}

So for CI, there is no need to modify .pre-commit-config.yaml right? The CI will just ignore - repo: local?

lorenzwalthert commented 1 year ago

Not sure it skips the local hook by default, you could also skip it explicitly as mentioned in the pre-commit.ci docs:

ci:
    skip: [pylint]

repos:
-   repo: local
    hooks:
    -   id: pylint
        # ...

lorenzwalthert commented 9 months ago

Great you achieved what you wanted. A solution to make this work withouth local hook would still be to

add a flag --load-package and call pkgload::load_all() if the flag is set (no need for it to be installed in the renv if the flag is not set
and to require the user to list all dependencies plus {pkgload} in additional_dependencies.

Or to use language: system for the hook and let useres use pre-commit CI lite to manage dependencies (instead of renv).

lorenzwalthert / precommit

Feature: allow package-loading prior to lint hook #440