Open DevinR528 opened 3 years ago
We shouldn't design the API this close to the Rust internals. For example I don't think we should do an Early/Late aka AST/HIR split in the API.
Also really quick skimming over your prototype: No need to rebuild the lint struct like this. We just have to provide a function/macro to define the lint in the most simplest way possible.
You think from the rustc direction towards how the API should look like. What we should rather do is to think about how the API should be used and then design it based on this.
- Wrap pre-expansion ast rustc ast
I think we should be careful with providing the pre-expansion lint pass, as it's not clear if rustc will continue to support this to my knowledge. (source). We could obviously still support it, but it could lead to technical debt. @flip1995 Do you know if there has been any updates on pre-expansion passes?
In general, I'm not sure if we want to directly map to the post-expansion pass or the rust internals. @flip1995 has already raised the concerns in the comment above.
Do you know if there has been any updates on pre-expansion passes?
No. But as I said, this doesn't really matter if we design the API from a usability perspective rather than a rustc perspective. API supported pre-expansion lints is definitely not something that goes into the MVP
When I said pre-expansion I meant something more like early pass/AST, sorry for being unclear. Anyways I'm definitely OK with leaving out early pass/AST. It would be nice to keep some of the Session functionality if possible though.
We shouldn't design the API this close to the Rust internals.
We do need to hook into the LateLintPass
methods somewhere though right? Basically, our (rustc) driver has to register something in rustc_driver::Callbacks
, or is there another way to hook in? I guess we could do our own walk in Callbacks::after_analysis? Do we want to reuse rustc error reporting it is a surprising amount of code and complicated?
We just have to provide a function/macro to define the lint in the simplest way possible.
What I had before the lint struct impl'ing the LintPass trait was
// lint-plugin-crate.rs
#[no_mangle]
pub fn lint(cx: &dyn Context, node: Node) { ... }
// in our API crate
//
// where Node is an enum with variants similar to the types LateLintPass methods use
enum Node {
Item(Item),
Ty(Ty),
//...
}
#
So starting from the user I would want to define a lint by writing a free function. I think that there is a lot of value in having a macro like declare_lint_tool even if just for documentation (and if we want to re-use the error reporting we kinda need it).
I'm excited to see other ways of imagining how this will work!
One thing that might be nice to have, even if not at first, is the ability to find (and iterate over) all instances of a type/method call given its fully-qualified path. So you could conceivably do something like instances_of!(::std::mem::transmute)
and get back an iterator of all transmute
s. No idea how this would work, but it's a thought.
One thing that might be nice to have, even if not at first, is the ability to find (and iterate over) all instances of a type/method call given its fully-qualified path.
I think this is an example of something we should rather implement in rustc and then provide the interface for it. I can imagine that rust-analyzer would also benefit from something like this.
Generally, I think we should be careful with implementing analyses in the API rather than in rustc. I'm not saying we shouldn't do this at all, just that we should always ask the question if we should directly implement the analysis in rustc.
That's fair, as rustc/clippy could likely use that sort of helper as well. I wasn't suggesting that for the MVP regardless. I suppose it would always be possible for a third-party library to create their own functions that do this sort of thing (using our API internally).
All lints in rustc and Clippy are run and only when emitting warnings/errors do they check if they are on/off. Seems like checking what lints are on/off first and only running the ones that are on would be more efficient? On the other hand, Clippy seems fairly snappy most of the time so maybe not worth it.
Given that lints will be able to have their severity (allow, warn, deny, forbid) changed in the code itself à la clippy/rustc lints, is that even possible?
You would still have to walk all the hir
"nodes" and check but I was more proposing going from
fn current_lint() {
run_lint_logic();
add_lint_to_emitter();
emitter_checks_if_lint_is_enabled();
}
to something like this
fn new_linter() {
check_if_lint_is_enabled();
run_lint_logic();
emit_lint();
}
Checking if the lint is enabled means walking ancestors of the "node" that is being checked looking for the various attributes and levels.
Oh, sure. That seems reasonable.
All lints in rustc and Clippy are run and only when emitting warnings/errors do they check if they are on/off.
That's true. Note that rustc has designed the linting interface to save a bit of performance when it comes to building the suggestion. The function to emit a lint struct_span_lint
takes a closure to construct the suggestion. This closure only gets executed when the lint will be emitted at node.
Most rustc lints that I've seen use this. Clippy rarely uses this though, I guess that this is mostly done for simplicity and readability.
Seems like checking what lints are on/off first and only running the ones that are on would be more efficient?
This would almost definitely be the case. For example, when running the metadata_collection_lint
we don't run any other lints for faster execution (See rust-lang/rust-clippy#7253).
However, note that only calling the lint checking code when the lint is enabled could prevent the user from emitting the lint at a different node. Rustc currently allows the emission of a lint at a different node than the one being checked via struct_span_lint_hir
.
I personally would like to have an automated mechanism that only calls linting code if the lint is actually enabled. However, only as long as it doesn't influence usability and flexibility for the developer. I personally use linters mainly in CI pipelines or manually shortly before creating a PR. In both cases, I'm fine if the execution takes a bit longer, if writing the actual lints is easier as a result.
I personally would like to have an automated mechanism that only calls linting code if the lint is actually enabled. However, only as long as it doesn't influence usability and flexibility for the developer. I personally use linters mainly in CI pipelines or manually shortly before creating a PR. In both cases, I'm fine if the execution takes a bit longer, if writing the actual lints is easier as a result.
Completely agree!! I feel like this will completely depend on how much of rustc's lint emitter we use.
On the contrary I have VS Code set up to run clippy on save with rust-analyzer, but the time it takes is more than reasonable.
I guess that this is something that has to be balanced, and we should probably have some documentation about "best practices" for performance etc.
Taking the "lint against transmutes" example, I believe that API and many others would be covered by a visitor. To try and be as much rustc-internals agnostic as possible, I suggest taking syn
as a reference: it does feature Visit{,Mut}
(and Fold
) traits; so in our case, we could envision a syn
-looking AST tree with a Visit
trait, but with the difference, w.r.t. syn
, that spanned Expr
s, spanned Pat
terns, spanned Ty
pes would be nice entry points for querying more related info.
Example (I'm keeping a lot of stuff elided on purpose; I believe we can at least agree the following is a desirable, intuitive, and not rustc-constrained API):
impl<…> Visit<…> for MyLint<…> {
fn visit_expr_path(…) … {
if path.resolves_to(path![BuiltinCrate::Core, "mem", "transmute"]) {
diag.emit…
}
}
}
That is, something I find quite important is to have "hardcoded" access to the core, alloc, std
trifecta to be able to easily identify stuff in a path-robust fashion. On the other hand, I don't believe that lang items ought to be handled in a special fashion, we can keep the core/alloc/std-related paths hardcoded: imho, we do not need to support no-core crates to start with (and may never do!).
Another "detail" is that we may be featuring a Span::call_site()
kind of API. By that, I mean that most of the API stuff would be thread-tied (!Send + !Sync
), and then a bunch of thread_local!
s would emulate extra parameters provided to the lint author. This maximizes lint code retrocompatibility, while keeping the code for lints quite simple: some people may feel uncomfortable if we end up with a cx
-looking parameters that we pass everywhere around and which is lifetime infected.
I believe I've said this before, but using syn as a base AST to start with seems to be the easiest, given that it was designed to be compiler-agnostic. So I agree on that 100%.
I agree that having a visitor pattern is a reasonable way forward, with a fair number of helper methods similar to what you've suggested. Handling std/alloc/core certainly shouldn't get special treatment, as they are "just" another crate. Having a macro that takes a fully-qualified path (like path!(::core::mem::transmute)
would allow for interoperability with any crate. To remain sensible we should probably take renames from Cargo.toml
into account, though I'm not certain on how this would work. We realistically need to handle different crate versions too, which will prove…interesting.
Given the possibility of having lints run in parallel in the future, I think it would be best to avoid thread-locals and anything !Send
or !Sync
.
I highly agree, using syn!
for the start also seems like a nice solution.
I believe that API and many others would be covered by a visitor.
True, that should cover almost all use case. Clippy is a great example for this. In the future, I would also like an API to get the entire AST tree at once to maybe create queries that work on the entire AST and can search in it. This is however only a basic idea, and I think that we should only provide a visitor API for now and do such experiments in additional crates :upside_down_face:.
Given the possibility of having lints run in parallel in the future, I think it would be best to avoid thread-locals and anything
!Send
or!Sync
.
Each single lint pass being single-threaded wouldn't imply that the whole lot of lint passes ought to be like that.
Basically I can imagine the "real" internal lint function being something like:
fn my_lint(codebase_to_lint…, compiler: CompilerHandle<'complex_lifetimes…>) {
…
}
And while we could expose CompilerHandle
as an opaque thing with more and more getters added to it, for lint authors, every time they would want to factor logic into its own function, they'd need to pass along that CompilerHandle<'…>
parameter, much like the compiler needs to do it.
I personally believe (I may be wrong!) that the ergonomics of the API will be more important than minor micro-optimizations that we could feature. So, to that purpose, I was suggesting that we:
'complex_lifetimes
,and featured either a free function to fetch a CompilerHandle
out of global (thread_local) storage, and/or a bunch of free function for the methods on the CompilerHandle
. That way we could achieve completely hiding the CompilerHandle
from the public API in the latter case, or at least hiding it from function boundaries in the former case.
fn wrapped_lint(codebase_to_lint…, compiler: CompilerHandle) { // <- classic signature for "us" to drive the lints
scoped_within_thread_local_storage(compiler, || {
user_provided_lint_impl(codebase_to_lint…) // <- simple signature for the users to implement
})
}
Much like proc macro implementations aren't passing the call site span across all of their helper functions: they just query it on demand through Span::call_site()
.
Fetching/accessing the CompilerHandle
out of that thread local storage would be the only thing thread-tied, but once that's done, people could potentially parallelize stuff on their own (although the current compiler internals are mostly single-threaded as well, so for a first release we may want to conservatively remain single-threaded), and even if a single lint implementation weren't parallelized, the whole batch execution could very well be.
But I digress, sorry 😅: these kind of differences are part of the "polishing the public API" steps that will come last, so we can go back to this discussion once that's relevant 🙂
I like the idea of using syn
(less work for us yay!) but isn't it a bit too EarlyPass
(it is only an AST)? I agree with using syn as inspiration but aren't we really only taking non_exhaustive
enums (which syn doesn't actually use) and a very strict guarantee not to break semver (syn is at 1.0.74
). Don't we want types more like hir or mir where we have DefId
s and HirId
s so we can query the compiler?
I have a hard time seeing how to get around having types that are convertible to/from rustc types (so similar enough) and converting them in the driver to avoid depending on rustc internals.
#[non_exhaustive]
pub struct Ty {
hir_id: HirId,
kind: Box<TyKind>,
span: Span,
}
impl Ty {
pub fn hir_id(&self) -> HirId { self.hir_id }
pub fn kind(&self) -> &TyKind { self.kind.as_ref() }
pub fn span(&self) -> Span { self.span }
}
pub struct TyBuilder {
pub hir_id: HirId,
pub kind: Box<TyKind>,
pub span: Span,
}
impl TyBuilder {
// `&dyn Context` is used to make sure that if fields change we have access to enough
// info to fill the field.
pub fn into_with_ctx(self, _: &dyn Context) -> Ty {
Ty { hir_id: self.hir_id, kind: self.kind, span: self.span }
}
}
To be clear I'm not suggesting we use syn per se, just something like syn. Multiple extensions and modifications will naturally be necessary to accomplish what we're trying to do.
Some summerizing, I was thinking of making issues of each bullet point for the API so we can discuss the separate aspecs.
syn
and non_exhaustive
no public fields)LibLoading
can access so the driver can pull them out of the dylib
We have our types be convertible from rustc internal types so we don't have to depend on nightly and keep a stable (as in compiler and semver) compilable API.
Context
type/traitDiagnosticBuilder
like thing current waySpan
LintContext::sess().source_map()
(thespan_*
and file querying methods)DefId
DefId
like-thing from path (&str
orSymbol
like)HirId
toDefId
andDefId
toHirId
I have a hard time imagining what something is going to look like so I have been messing around by writing a test driver/API. I realize it's super early and maybe not that helpful but it's how I think so on the off chance it helps out heres what I've been thinking https://github.com/DevinR528/pluggy check out the readme for a quick overview.
Pros of this or similar implementation
Context
to the conversion functions and accessors (maybe that's overkill) this will be fairly sturdy semver wiseCons