New `lint_setequal()` linter

r-lib / lintr

Static Code Analysis for R

https://lintr.r-lib.org

Other

1.2k stars 187 forks source link

New `lint_setequal()` linter #1763

Open Bisaloo opened 2 years ago

Bisaloo commented 2 years ago

This aims at encouraging the use of setequal() in place of some less clear patterns I've occasionally seen:

identical(sort(x), sort(y))
identical(unique(x), y)
identical(unique(x), unique(y))
identical(sort(unique(x)), sort(unique(y)))

or any similar combination.

This linter could also use the same logic to encourage the use of testthat::expect_setequal(). Possibly also recommend using testthat::expect_setequal() over testthat::expect_true(setequal())?

If you agree this would be a good addition, I can work on a PR. Do you see any potential pitfalls I should be looking out for?

AshesITR commented 2 years ago

Note that unique() is an S3 generic so we can't guarantee it does what we think it does. unique.data.frame() comes to mind.

Also note the only truly equivalent call is identical(sort(unique(x)), sort(unique(y))), e.g. identical(unique(x), unique(rev(x))) is false if x contains more than one unique value.

MichaelChirico commented 1 year ago

Relevant:

https://github.com/r-lib/lintr/pull/951 https://github.com/r-lib/testthat/issues/1657

Bisaloo commented 1 year ago

As discussed in #951, I believe (assume?) that in many cases where people use identical(sort(x), sort(y)), they actually want setequal() but you are right they are not strictly equivalent.

Two questions:

do you believe a linter linting only identical(sort(unique(x)), sort(unique(y))) and identical(unique(sort(x)), unique(sort(y))) could still be useful? Documentation of this pattern in CRAN packages
would it be okay to lint cases like identical(sort(x), sort(y)) as well if disabled by default? For example, using an argument which would default to linting just the strictly equivalent

AshesITR commented 1 year ago

NB sort() silently removes NAs.