r-lib / lintr

Static Code Analysis for R
https://lintr.r-lib.org
Other
1.2k stars 187 forks source link

New `lint_setequal()` linter #1763

Open Bisaloo opened 2 years ago

Bisaloo commented 2 years ago

This aims at encouraging the use of setequal() in place of some less clear patterns I've occasionally seen:

identical(sort(x), sort(y))
identical(unique(x), y)
identical(unique(x), unique(y))
identical(sort(unique(x)), sort(unique(y)))

or any similar combination.

This linter could also use the same logic to encourage the use of testthat::expect_setequal(). Possibly also recommend using testthat::expect_setequal() over testthat::expect_true(setequal())?

If you agree this would be a good addition, I can work on a PR. Do you see any potential pitfalls I should be looking out for?

AshesITR commented 2 years ago

Note that unique() is an S3 generic so we can't guarantee it does what we think it does. unique.data.frame() comes to mind.

Also note the only truly equivalent call is identical(sort(unique(x)), sort(unique(y))), e.g. identical(unique(x), unique(rev(x))) is false if x contains more than one unique value.

MichaelChirico commented 1 year ago

Relevant:

https://github.com/r-lib/lintr/pull/951 https://github.com/r-lib/testthat/issues/1657

Bisaloo commented 1 year ago

As discussed in #951, I believe (assume?) that in many cases where people use identical(sort(x), sort(y)), they actually want setequal() but you are right they are not strictly equivalent.

Two questions:

AshesITR commented 1 year ago

NB sort() silently removes NAs.