Open MichaelChirico opened 1 year ago
Slightly leaning to lint()
. Vectorizing lint()
should keep a stable output type IINM, so has no apparent downsides to me.
Progress handling and flattening should be refactored into lint()
and lint_package()
should probalby also be rewritten to use lint()
directly.
While we're at it, we should split lint.R
into bite sized chunks. It's become quite long.
New calls could be architected like this:
lint_{dir,package} -> lint() -> lint_impl()
Where lint_impl()
is the unvectorized lint()
and not exported.
lint_{dir,package}()
would do file enumeration and delegate the rest to lint()
.
NB: This might also simplify the lint_changed_files
workflow:
One thing I noticed in initial work on this -- why do we enable caching for ad-hoc lint workflows (lint(text = ...)
or lint("code\n")
)? That doesn't seem very practical... is there a good reason for this?
If it's just for testing, I'd refactor tests instead.
Not sure if any downstreams use this to lint an open file "in-memory", but such usage could benefit a lot from caching. Seeing that most integrations probably don't run in R, it seems unlikely, though.
I guess I'm still not really seeing the use case... the "in-memory" cache key is based on the entire file string, right?
https://github.com/r-lib/lintr/blob/847d789d5d398a20ce792d2a9478a615cd0d5dfa/R/cache.R#L32-L34
So to benefit it'd have to be re-running lints dozens/hundreds of times on the same un-changed in-memory string?
Or maybe the idea is running a "random-ish" set of linters many times, some number of which are "new" on each run, and those ones don't have to be re-run?
Probably I'm missing something.
We also cache at the linter x expr level (cache_lint
)
Sometimes, we want to lint a vector of files that's not neatly captured by
lint_dir()
.Currently, there's three ways to do so:
lint_dir()
with non-defaultpatterns=
. This is not very flexible --patterns=
is passed todir(patterns=)
, which only operates onbasename()
.lint()
. This sacrifices several things handled bylint_dir()
, like a progress meter, possible parallelization (#485),flatten_lints()
merging results, etc..lintr
config to set exclusions. This works well for simple packages/projects, but is not fully general and very clunky for interactive use cases.One use case I have in mind is linting the R sources.
lint_dir()
includes lots of files that we'd typically not want to lint, e.g. test scripts for the R parser that intentionally use bad R syntax, and the concatenated sourceall.R
files. It's very hard to getlint_dir()
to ignore thoseall.R
files especially, which is very inefficient, so the best way to lint the R sources currently is to filterlist.files()
by hand, thenlintr:::flatten_lints(lapply(filtered_files, lint, ...))
.I can see ~two~ three (3rd thanks to @JSchoenbachler) options for improving this situation:
lint_dir()
gets an argument specifying a vector of files, e.g.files=
, user can specify eitherpath=
orfiles=
and the rest, includinglint_package()
is unchanged.lint()
accepts a vector of file names infilename=
.lint_files()
function.I can see arguments for both; I'm currently leaning towards option (1) because the code change will be very small & will make it easier to plug in to parallelization in the future. The main advantage of option (2) is not growing the argument surface of any functions. (3) is also a new function, but we could have
lint_dir()
call it so that all of the logic around combining lints over multiple files is still in one place.Filing issue here for feedback/discussion.