r-lib / lintr

Static Code Analysis for R
https://lintr.r-lib.org
Other
1.19k stars 184 forks source link

object_usage_linter gets line number wrong in multi-NSE case #1914

Open MichaelChirico opened 1 year ago

MichaelChirico commented 1 year ago

Here, note that NSE is used in both the formula and the data= argument, and it's throwing off the line number detection:

lintr::lint("
foo <- function(x) {
  lm(
    y ~ z,
    data = x[!is.na(y)]
  )
}
", lintr::object_usage_linter())
# <text>:4:5: warning: [object_usage_linter] no visible binding for global variable 'y'
#     y ~ z,
#     ^

This makes it look like codetools is confused by the LHS of the formula, but actually what's being marked is the data= argument on the next line.

codetools itself only provides a range:

codetools::checkUsage(foo)
# <anonymous>: no visible binding for global variable ‘y’ (:2-5)
MichaelChirico commented 1 year ago

We also have to cover usage like:

ggplot(
  y[which(!is.na(y$col)), ],
  aes(x = col, fill = factor(is.mobile))
)

Currently, the linter lands on y$col for the lint generated by x = col.

I presume similar would happen for y@col. Anything else we should be wary of?

MichaelChirico commented 1 year ago

In full generality, this is a bit of a fool's errand. What about

ggplot(
  y[with(y, !is.na(col)), ],
  aes(x = col, fill = factor(is.mobile))
)

(assuming skipWith = TRUE)?

checkUsage() will be flagging the x = col usage, but object_usage_linter() will identify the one under with(...). IINM we'd have to re-run checkUsage() on each sub-expression (possibly recursively!) to fix this. 🤮

Fundamentally we need codetools to improve the metadata it returns here.

I'm happy that we've covered the most common cases in #1915, so still marking this issue as closed.

MichaelChirico commented 1 year ago

Filed https://gitlab.com/luke-tierney/codetools/-/issues/10 as a request for codetools to be a more powerful solution here.