Assignments in control structures

r-lib / lintr

Static Code Analysis for R

https://lintr.r-lib.org

Other

1.16k stars 184 forks source link

Assignments in control structures #2419

Open IndrajeetPatil opened 7 months ago

IndrajeetPatil commented 7 months ago

In R, there are two ways to do assignments in the context of if/else, and I have always wondered if one should be preferred over the other.

So I wanted to create this issue to solicit feedback on whether one of these is more idiomatic or error-prone or readable or etc. than the other? And, if yes, is this something that will fall under the purview of {lintr}?

Assignment happens in each branch

x <- -1L
if (x < 0L) {
  y <- -1L * x
} else {
  y <- x
}

y
#> [1] 1

Assignment happens outside branches

x <- -1L
y <- if (x < 0L) {
  -1L * x
} else {
  x
}

y
#> [1] 1

^{Created on 2023-12-12 with reprex v2.0.2}

MichaelChirico commented 7 months ago

I don't like such a rule personally, I think both styles work well in different contexts. haven't thought through anything systematic about when I prefer one or the other. I know Martin Mächler from R core strongly prefers the latter so I guess base R will favor that style.

Anyway I'd be fine to host such a linter in the package, it seems reasonable maintainers might want to enforce consistency. Can I suggest implementing both styles as an option and having a required parameter (i.e. we don't take a stand on what the default style should be)?

IndrajeetPatil commented 6 months ago

Thanks for the feedback!

When there is more than one way to do something, one would like to know if there is a reason to prefer one over the other, and thus the issue. In this case, there doesn't seem to be, at least not in our experience. So no point in creating a linter for this, unless requested by users at some point.

MichaelChirico commented 6 months ago

I think "consistency" is a decent reason, though. we can list our benefits for both, I just don't feel like either choice's benefits clearly dominate to the exclusion of the other.

it might be illustrative to write out the linter and see how much consistency there is in a few codebases, or if you can tease out some rules on when not to use one or the other style.

IndrajeetPatil commented 1 month ago

Here is a context in which these two styles of assignment produce different results:

some_cond <- FALSE

x <- if (some_cond) "a"
x
#> NULL

rm(x)

if (some_cond) x <- "a"
x
#> Error in eval(expr, envir, enclos): object 'x' not found

^{Created on 2024-06-07 with reprex v2.1.0}

I think, this makes the latter style a bit more desirable because an error would be more informative than silently succeeding because of default assignment of NULL.

If the former behaviour is what one wants, that intention is easy enough to capture in the latter style as well:

some_cond <- FALSE
x <- NULL
if (some_cond) x <- "a"

What do you think?

AshesITR commented 1 month ago

I tend to omit implicit else NULLs NB your example can be shortened to

x <- if (some_cond) "a" else NULL

Re flexibility: My personal preference is to only use outer assignments if the whole constructs fits within a line and is easy to grasp. The latter of course is hard to statically check 😅

MichaelChirico commented 3 weeks ago

I also tend to omit implicit NULLs; some authors/codebases might prefer making those explicit though, as in return_linter(allow_implicit_else=FALSE).