What we need from the API

DevinR528 commented 3 years ago

We have our types be convertible from rustc internal types so we don't have to depend on nightly and keep a stable (as in compiler and semver) compilable API.

A Context type/trait
- access a DiagnosticBuilder like thing current way
- provides snippet access from something like Span
- a few methods from LintContext::sess().source_map() (the span_* and file querying methods)
- provides type access from something like DefId
- get DefId like-thing from path (&str or Symbol like)
- access a Map like thing
- query attributes of items/expr/stmt and whatever else is valid
- HirId to DefId and DefId to HirId
- and more...
Wrap pre-expansion ast rustc ast
Wrap post-expansion hir rustc hir
Other types will be needed as the return type for things like Map and typeck

I have a hard time imagining what something is going to look like so I have been messing around by writing a test driver/API. I realize it's super early and maybe not that helpful but it's how I think so on the off chance it helps out heres what I've been thinking https://github.com/DevinR528/pluggy check out the readme for a quick overview.

Pros of this or similar implementation

would allow for different driver implementations (this may solve the alternate compiler impls?)
if we pass the Context to the conversion functions and accessors (maybe that's overkill) this will be fairly sturdy semver wise
the driver can be pre-compiled for some targets so having the rustc internals here might be the least bad option?
fairly easy to implement a lint

Cons

puts a lot of burden on the driver impl
driver requires rustc internals
it might be nice to try a different design than Clippy?

flip1995 commented 3 years ago

We shouldn't design the API this close to the Rust internals. For example I don't think we should do an Early/Late aka AST/HIR split in the API.

Also really quick skimming over your prototype: No need to rebuild the lint struct like this. We just have to provide a function/macro to define the lint in the most simplest way possible.

You think from the rustc direction towards how the API should look like. What we should rather do is to think about how the API should be used and then design it based on this.

xFrednet commented 3 years ago

Wrap pre-expansion ast rustc ast

I think we should be careful with providing the pre-expansion lint pass, as it's not clear if rustc will continue to support this to my knowledge. (source). We could obviously still support it, but it could lead to technical debt. @flip1995 Do you know if there has been any updates on pre-expansion passes?

In general, I'm not sure if we want to directly map to the post-expansion pass or the rust internals. @flip1995 has already raised the concerns in the comment above.

flip1995 commented 3 years ago

Do you know if there has been any updates on pre-expansion passes?

No. But as I said, this doesn't really matter if we design the API from a usability perspective rather than a rustc perspective. API supported pre-expansion lints is definitely not something that goes into the MVP

DevinR528 commented 3 years ago

When I said pre-expansion I meant something more like early pass/AST, sorry for being unclear. Anyways I'm definitely OK with leaving out early pass/AST. It would be nice to keep some of the Session functionality if possible though.

We shouldn't design the API this close to the Rust internals.

We do need to hook into the LateLintPass methods somewhere though right? Basically, our (rustc) driver has to register something in rustc_driver::Callbacks, or is there another way to hook in? I guess we could do our own walk in Callbacks::after_analysis? Do we want to reuse rustc error reporting it is a surprising amount of code and complicated?

We just have to provide a function/macro to define the lint in the simplest way possible.

What I had before the lint struct impl'ing the LintPass trait was

// lint-plugin-crate.rs
#[no_mangle]
pub fn lint(cx: &dyn Context, node: Node) { ... }

// in our API crate
//
// where Node is an enum with variants similar to the types LateLintPass methods use
enum Node {
    Item(Item),
    Ty(Ty),
    //...
}

#

So starting from the user I would want to define a lint by writing a free function. I think that there is a lot of value in having a macro like declare_lint_tool even if just for documentation (and if we want to re-use the error reporting we kinda need it).

I'm excited to see other ways of imagining how this will work!

jhpratt commented 3 years ago

One thing that might be nice to have, even if not at first, is the ability to find (and iterate over) all instances of a type/method call given its fully-qualified path. So you could conceivably do something like instances_of!(::std::mem::transmute) and get back an iterator of all transmutes. No idea how this would work, but it's a thought.

flip1995 commented 3 years ago

One thing that might be nice to have, even if not at first, is the ability to find (and iterate over) all instances of a type/method call given its fully-qualified path.

I think this is an example of something we should rather implement in rustc and then provide the interface for it. I can imagine that rust-analyzer would also benefit from something like this.

Generally, I think we should be careful with implementing analyses in the API rather than in rustc. I'm not saying we shouldn't do this at all, just that we should always ask the question if we should directly implement the analysis in rustc.

jhpratt commented 3 years ago

That's fair, as rustc/clippy could likely use that sort of helper as well. I wasn't suggesting that for the MVP regardless. I suppose it would always be possible for a third-party library to create their own functions that do this sort of thing (using our API internally).

DevinR528 commented 3 years ago

All lints in rustc and Clippy are run and only when emitting warnings/errors do they check if they are on/off. Seems like checking what lints are on/off first and only running the ones that are on would be more efficient? On the other hand, Clippy seems fairly snappy most of the time so maybe not worth it.

jhpratt commented 3 years ago

Given that lints will be able to have their severity (allow, warn, deny, forbid) changed in the code itself à la clippy/rustc lints, is that even possible?

DevinR528 commented 3 years ago

You would still have to walk all the hir "nodes" and check but I was more proposing going from

fn current_lint() {
    run_lint_logic();
    add_lint_to_emitter();
    emitter_checks_if_lint_is_enabled();
}

to something like this

fn new_linter() {
    check_if_lint_is_enabled();
    run_lint_logic();
    emit_lint();
}

Checking if the lint is enabled means walking ancestors of the "node" that is being checked looking for the various attributes and levels.

jhpratt commented 3 years ago

Oh, sure. That seems reasonable.

xFrednet commented 3 years ago

All lints in rustc and Clippy are run and only when emitting warnings/errors do they check if they are on/off.

That's true. Note that rustc has designed the linting interface to save a bit of performance when it comes to building the suggestion. The function to emit a lint struct_span_lint takes a closure to construct the suggestion. This closure only gets executed when the lint will be emitted at node.

Most rustc lints that I've seen use this. Clippy rarely uses this though, I guess that this is mostly done for simplicity and readability.

Seems like checking what lints are on/off first and only running the ones that are on would be more efficient?

This would almost definitely be the case. For example, when running the metadata_collection_lint we don't run any other lints for faster execution (See rust-lang/rust-clippy#7253).

However, note that only calling the lint checking code when the lint is enabled could prevent the user from emitting the lint at a different node. Rustc currently allows the emission of a lint at a different node than the one being checked via struct_span_lint_hir.

I personally would like to have an automated mechanism that only calls linting code if the lint is actually enabled. However, only as long as it doesn't influence usability and flexibility for the developer. I personally use linters mainly in CI pipelines or manually shortly before creating a PR. In both cases, I'm fine if the execution takes a bit longer, if writing the actual lints is easier as a result.

DevinR528 commented 3 years ago

I personally would like to have an automated mechanism that only calls linting code if the lint is actually enabled. However, only as long as it doesn't influence usability and flexibility for the developer. I personally use linters mainly in CI pipelines or manually shortly before creating a PR. In both cases, I'm fine if the execution takes a bit longer, if writing the actual lints is easier as a result.

Completely agree!! I feel like this will completely depend on how much of rustc's lint emitter we use.

jhpratt commented 3 years ago

On the contrary I have VS Code set up to run clippy on save with rust-analyzer, but the time it takes is more than reasonable.

xFrednet commented 3 years ago

I guess that this is something that has to be balanced, and we should probably have some documentation about "best practices" for performance etc.

danielhenrymantilla commented 3 years ago

Taking the "lint against transmutes" example, I believe that API and many others would be covered by a visitor. To try and be as much rustc-internals agnostic as possible, I suggest taking syn as a reference: it does feature Visit{,Mut} (and Fold) traits; so in our case, we could envision a syn-looking AST tree with a Visit trait, but with the difference, w.r.t. syn, that spanned Exprs, spanned Patterns, spanned Types would be nice entry points for querying more related info.

Example (I'm keeping a lot of stuff elided on purpose; I believe we can at least agree the following is a desirable, intuitive, and not rustc-constrained API):

impl<…> Visit<…> for MyLint<…> {
    fn visit_expr_path(…) … {
        if path.resolves_to(path![BuiltinCrate::Core, "mem", "transmute"]) {
            diag.emit…
        }
    }
}

That is, something I find quite important is to have "hardcoded" access to the core, alloc, std trifecta to be able to easily identify stuff in a path-robust fashion. On the other hand, I don't believe that lang items ought to be handled in a special fashion, we can keep the core/alloc/std-related paths hardcoded: imho, we do not need to support no-core crates to start with (and may never do!).
Another "detail" is that we may be featuring a Span::call_site() kind of API. By that, I mean that most of the API stuff would be thread-tied (!Send + !Sync), and then a bunch of thread_local!s would emulate extra parameters provided to the lint author. This maximizes lint code retrocompatibility, while keeping the code for lints quite simple: some people may feel uncomfortable if we end up with a cx-looking parameters that we pass everywhere around and which is lifetime infected.

jhpratt commented 3 years ago

I believe I've said this before, but using syn as a base AST to start with seems to be the easiest, given that it was designed to be compiler-agnostic. So I agree on that 100%.

I agree that having a visitor pattern is a reasonable way forward, with a fair number of helper methods similar to what you've suggested. Handling std/alloc/core certainly shouldn't get special treatment, as they are "just" another crate. Having a macro that takes a fully-qualified path (like path!(::core::mem::transmute) would allow for interoperability with any crate. To remain sensible we should probably take renames from Cargo.toml into account, though I'm not certain on how this would work. We realistically need to handle different crate versions too, which will prove…interesting.

Given the possibility of having lints run in parallel in the future, I think it would be best to avoid thread-locals and anything !Send or !Sync.

xFrednet commented 3 years ago

I highly agree, using syn! for the start also seems like a nice solution.

I believe that API and many others would be covered by a visitor.

True, that should cover almost all use case. Clippy is a great example for this. In the future, I would also like an API to get the entire AST tree at once to maybe create queries that work on the entire AST and can search in it. This is however only a basic idea, and I think that we should only provide a visitor API for now and do such experiments in additional crates :upside_down_face:.

danielhenrymantilla commented 3 years ago

Given the possibility of having lints run in parallel in the future, I think it would be best to avoid thread-locals and anything !Send or !Sync.

Each single lint pass being single-threaded wouldn't imply that the whole lot of lint passes ought to be like that.

Basically I can imagine the "real" internal lint function being something like:

fn my_lint(codebase_to_lint…, compiler: CompilerHandle<'complex_lifetimes…>) {
    …
}

And while we could expose CompilerHandle as an opaque thing with more and more getters added to it, for lint authors, every time they would want to factor logic into its own function, they'd need to pass along that CompilerHandle<'…> parameter, much like the compiler needs to do it.

I personally believe (I may be wrong!) that the ergonomics of the API will be more important than minor micro-optimizations that we could feature. So, to that purpose, I was suggesting that we:

used runtime-managed pointers to remove the 'complex_lifetimes,
and featured either a free function to fetch a CompilerHandle out of global (thread_local) storage, and/or a bunch of free function for the methods on the CompilerHandle. That way we could achieve completely hiding the CompilerHandle from the public API in the latter case, or at least hiding it from function boundaries in the former case.
- ```
fn wrapped_lint(codebase_to_lint…, compiler: CompilerHandle) { // <- classic signature for "us" to drive the lints
scoped_within_thread_local_storage(compiler, || {
    user_provided_lint_impl(codebase_to_lint…) // <- simple signature for the users to implement
})
}
```
- Much like proc macro implementations aren't passing the call site span across all of their helper functions: they just query it on demand through Span::call_site().

Fetching/accessing the CompilerHandle out of that thread local storage would be the only thing thread-tied, but once that's done, people could potentially parallelize stuff on their own (although the current compiler internals are mostly single-threaded as well, so for a first release we may want to conservatively remain single-threaded), and even if a single lint implementation weren't parallelized, the whole batch execution could very well be.

But I digress, sorry 😅: these kind of differences are part of the "polishing the public API" steps that will come last, so we can go back to this discussion once that's relevant 🙂

DevinR528 commented 3 years ago

I like the idea of using syn (less work for us yay!) but isn't it a bit too EarlyPass (it is only an AST)? I agree with using syn as inspiration but aren't we really only taking non_exhaustive enums (which syn doesn't actually use) and a very strict guarantee not to break semver (syn is at 1.0.74). Don't we want types more like hir or mir where we have DefIds and HirIds so we can query the compiler?

I have a hard time seeing how to get around having types that are convertible to/from rustc types (so similar enough) and converting them in the driver to avoid depending on rustc internals.

#[non_exhaustive]
pub struct Ty {
    hir_id: HirId,
    kind: Box<TyKind>,
    span: Span,
}

impl Ty {
    pub fn hir_id(&self) -> HirId { self.hir_id }
    pub fn kind(&self) -> &TyKind { self.kind.as_ref() }
    pub fn span(&self) -> Span { self.span }
}

pub struct TyBuilder {
    pub hir_id: HirId,
    pub kind: Box<TyKind>,
    pub span: Span,
}

impl TyBuilder {
    // `&dyn Context` is used to make sure that if fields change we have access to enough
    // info to fill the field.
    pub fn into_with_ctx(self, _: &dyn Context) -> Ty {
        Ty { hir_id: self.hir_id, kind: self.kind, span: self.span }
    }
}

jhpratt commented 3 years ago

To be clear I'm not suggesting we use syn per se, just something like syn. Multiple extensions and modifications will naturally be necessary to accomplish what we're trying to do.

DevinR528 commented 3 years ago

Some summerizing, I was thinking of making issues of each bullet point for the API so we can discuss the separate aspecs.

API

Stable types (like rustc's hir types but future proof, think syn and non_exhaustive no public fields)
A context type/trait (some way to query rustc's internal info)
A visitor like trait to walk the top-level hir kinds (this is what lint authors will use)
probably some kind of macro to declare the lint with docs and export some magic function LibLoading can access so the driver can pull them out of the dylib

Driver (this will be what makes alternative compilers possible-sih)

Conversion from our API types to rustc
Dynamic loading of something (dylib/wasm/etc)
Register the dynamic lints with the compiler

rust-marker / design