How will lints be provided/discovered?

rust-marker / design

Apache License 2.0

15 stars 1 forks source link

How will lints be provided/discovered? #6

Open jhpratt opened 3 years ago

jhpratt commented 3 years ago

I think disagreements/misunderstandings over how the lints will be provided. I'm opening this to try and resolve this basic issue, as I believe it will implicitly resolve a number of discussions opened.

What I originally pictured was a fully standalone tool that had no explicit integration with crates — anyone could use any namespace for the lint. This would permit users to supply their own lints without uploading them to a registry, although a registry would still likely exist.

What I believe others are picturing is requiring a registry to exist and restricting publishing to that registry to those who have publish permissions on crates.io for the crate of the same name, implicitly limiting lints to those who are willing to publish them. However, quick inspection on my end shows that it would still be possible to have private lints, as crates.io doesn't permit crate names with a leading underscore. As such we could just require private lints be prefixed with one. How private lints are discovered is a separate issue that doesn't need to be (and shouldn't) be discussed here. Published lints would be automatically downloaded (presumably into a global cache) for all direct dependencies (nb: we probably need something better because indirect dependencies can be part of the public API).

If I am understanding what others are thinking, I'm on board. It seems reasonable to restrict "official" lints, and it should be doable with information from the auto-published crates.io registry. Does this sound reasonable? If so, we can resolve issues such as forbid-by-default, as my concern over being able to "just" not load the lint wouldn't apply.

flip1995 commented 3 years ago

IMO, we shouldn't reinvent the wheel here. I would make the custom crate lints just another crate, that may or may not be published on crates.io. No need for a special registry or anything like that. With that we could also load the lints directly from git reusing the infrastructure that already exists in cargo.

jhpratt commented 3 years ago

How would the lint tool load and execute the various custom lints in that situation?

How would this work if a lint author wanted to use nightly — would the downstream user be forced to use nightly as well, given that they'd be compiling the code lint? I also think that would eliminate any possibility of a cache (unless they're already using sccache), which doesn't seem ideal.

flip1995 commented 3 years ago

How would the lint tool load and execute the various custom lints in that situation?

That I don't really know yet. Compile the lints into dylibs or wasm and then load them and execute them?

How would this work if a lint author wanted to use nightly — would the downstream user be forced to use nightly as well

I would say yes, as it is the case with dependencies. We should recommend that the linter for the dependency uses the same Rust compiler as the dependency itself. That way no extra compiler would have to be used to run the lints.

I also think that would eliminate any possibility of a cache

The end users would probably have to compile the linters once. This would be in line with other dependencies. I see value in the possibility to run pre-compiled linters though.

DevinR528 commented 3 years ago

Unless our API is released at 1.0 and never bumped to 2.0 (no semver breaking changes) aren't we just sort of inverting the dylint problem? Dependency problems are hard and with so many different crates (the lint plugins) using one crate (our API) I can potentially see a problem? If a user wants to use our tool and they have a few dependencies I can imagine they would have to leave off some lints (lint plugins/ whatever we mean when we are talking about the regex_lint = 1.0). The other issue I see with this is adding dependency complexity to the user, a really nice thing about clippy is it just works, our tool would need a fair amount of set-up (possibly). I guess this is a trade-off for flexibility that I personally would make I just thought it worth mentioning.

Semver would be a problem, right I'm not just imagining it? We couldn't just keep old API versions intact and use them when dynamically loading, ugh that's dylint basically.

All that said I'm not sure a registry with pre-compiled wasm would fix the problem.

flip1995 commented 3 years ago

a really nice thing about clippy is it just works

Back in the day you had to compile Clippy yourself in order to use it. The reason why it Just Works now is that it is ships pre-compiled with rustup. It was a process getting there and it also will be for this tool/API.

Semver would be a problem, right I'm not just imagining it? We couldn't just keep old API versions intact and use them when dynamically loading, ugh that's dylint basically.

If we keep backwards compatibility, we could compile all linters (thingy that uses this API to implement lints) that should be run, with the newest version of the stable API and would only need to compile it once. The problem with dylint is, that with every custom linter you have to compile/provide the rustc(driver) libraries the custom lint depends on, so worst case, you have N instances of `rustc*` pre-compiled libraries for N linters.

The point of this API is, that when we compile the linters just before they are run, we can just compile them with the latest version as per semver versioning and only have to keep one version of the API around.

DevinR528 commented 3 years ago

If we keep backwards compatibility, we could compile all linters (thingy that uses this API to implement lints) that should be run, with the newest version of the stable API and would only need to compile it once

So one rather large requirement to our API is that it has to be completely future proof so we never have breaking changes.

Back in the day you had to compile Clippy yourself in order to use it.

More what I meant by Clippy just working is that you don't have to set it up to work in a sane default way, it does that all by itself since it owns all the lints (the stuff in clippy_lints/).

jhpratt commented 3 years ago

That I don't really know yet. Compile the lints into dylibs or wasm and then load them and execute them?

Not to be rude but that's a pretty big detail 😆 It's certainly possible to compile and load them, though that seems less than ideal.

The end users would probably have to compile the linters once. This would be in line with other dependencies. I see value in the possibility to run pre-compiled linters though.

Once per project, despite the fact that that (imo, this is under discussion) lints won't have features that can be enabled and disabled. As a result the compiled dylib would be identical (sans compiler versioning for non-wasm dylibs), so there isn't any reason they couldn't be shared.

Unless our API is released at 1.0 and never bumped to 2.0 (no semver breaking changes)

So one rather large requirement to our API is that it has to be completely future proof so we never have breaking changes.

That is in fact the goal. I should explicitly mention that in the "must haves". A 1.160.0 version would be acceptable, though.

All that said I'm not sure a registry with pre-compiled wasm would fix the problem.

Correct. Even if shipped as pre-compiled wasm, the interface would have to remain stable to ensure the wasm dylibs could be executed.

With regard to having a registry, I think having one has a number of advantages. Namely, we can

ship pre-built dylibs (at least for common platforms if wasm is not used)
guarantee that the linting API is upheld (as mentioned in "could haves")
ensure the user publishing the lint can publish the crate of the same name on crates.io. This means that serde can use the serde::blah namespace for linting and users know it's official. We can provide a top-level namespaceor other syntax for unofficial and non-crate-affiliated lints.
seamlessly integrate with existing libraries. The linter could see all (direct?) dependencies, automatically download those lints into a global cache, and execute them. All without the user needing to know or care how it works — as far as they're concerned the library was shipped with the lints built-in.

The last bullet is a huge one imo.

DevinR528 commented 3 years ago

I second the last bullet point as being important, along with being able to turn them off as well.

jhpratt commented 3 years ago

Cargo.toml has the ability to provide information that is specifically for third parties, so we could definitely take advantage of that to disable certain lint packages.

flip1995 commented 3 years ago

(Now using the terminology @jhpratt suggested in their last comment in #5)

Not to be rude but that's a pretty big detail

That is true. So the linter uses the existing cargo infrastructure to download, compile (and possibly install) the lint packages from crates.io. Now you have the compiled dylibs for example in $HOME/.cargo. And then the linter loads those dylibs and executes the lints in them.

Once per project

Not necessarily. We could compile them into $HOME/.cargo instead of the target/ dir in the project. (cargo install instead of cargo build)

So one rather large requirement to our API is that it has to be completely future proof

Yes, that's the point of this effort.

More what I meant by Clippy just working is that you don't have to set it up to work in a sane default way, it does that all by itself since it owns all the lints (the stuff in clippy_lints/).

Yes, but that would also be the case if cargo lint Just Works(tm). Meaning, that it will set up everything automatically and then run the lints.

ship pre-built dylibs (at least for common platforms if wasm is not used)

We would have to build a complete registry for this. I really don't want to reinvent the wheel here. Also there are security concerns. Some people already complain that cargo just automatically downloads the source code from crates.io and builds it out of concerns of security. Downloading prebuilt binaries is a really hard sell IMO. Even if the source code is public, I couldn't be sure what's in the binary.

Because of this we would have to provide an easy way to automatically compile and install those lint packages from source anyway.

This means that serde can use the serde::blah namespace for linting and users know it's official.

This is independent of the crate name that it is published under. We can't prevent people from namespacing their lints under whatever name they want.

All without the user needing to know or care how it works — as far as they're concerned the library was shipped with the lints built-in.

This would also be the case with compiling the lints JIT (iff no prebuilt lint packages are found in $HOME/.cargo/ or wherever).

I have two major points here:

KISS: If we start to write a registry, then search some place where we can host it, then have to efficiently store pre-built binaries there for multiple targets per linter (and possibly multiple versions), we would be busy for the next half year just writing this registry before we could even start with the actual project of making it easy to write lints. So we should keep it simple and reuse as much of the infrastructure that already exists. And this is crates.io and cargo, where we get the infrastructure, versioning etc. for free.
KISS for users: Since 2015 Rust users run cargo build and this will download the crates and compiles them. We shouldn't introduce new things to the users if not absolutely necessary. Only shipping prebuilt binaries from a new registry would definitely be a new thing, where I personally don't see the necessity.

Currently there are 4 people actively participating in the discussions in this org/repo. So we don't have the resources to build every part of the infra from scratch. We must reuse as much as possible in order to make any progress here.

chorman0773 commented 3 years ago

My question is why not just treat them as a (potentially a special class of) dev-dependencies. This means that while they will be compiled each time locally, cargo (or our custom command before it's merged) would handle those details, and the details of invoking the linter. It would require integration on both ends of the compiler, but that wouldn't be unreasonable. It would also allow lints to live the same place that crates already do - crates.io. Additionally, it would elliminate the duplication problem of (target, version, compiler), because whatever registery we use could just store source code. Further, because it get's built by the compiler used to build the rest of the project (or a different compilation thereof - for example, with gcc-rs when cross-compiling), then the lint has the option of reaching into the compiler, and asking it about things that would be impossible to ask if it was compiled once by some compiler: for example, the layout of certain types. This would allow compatibility lints to not guess these details. It would also be simple in all respects - a lint writer would just write a special kind of crate and publish it like a proc-macro, a user would write the lints they want into the [dev-dependencies] or another (Bikesheddable, but could be [linters] or [lint-dependencies]) section, and the lint driver would just invoke the compiler with fancy flags to build the lint (Bikesheddable: --crate-type lint) and fancy flags to run the linter (Possibly just --extern would do it, as this is what is used for proc-macro, but a dedicated option could be used). The majority of the burden would be on implementors that want to support custom lints, but to would also given them great freedom in deciding how to do so (wasm dylib that it loads in an interpreter, standalone binary that it talks to over IPC or otherwise, shared object that it dlopens/LoadLibraryExs into it's own address-space and runs as-is).

DevinR528 commented 3 years ago

Well, @flip1995's two-item list along with a few other things has moved me more towards having them be dependencies like "(Bikesheddable, but could be [linters] or [lint-dependencies])". I feel like putting the lint $HOME/.cargo might lead to some large .cargo folders :shrug: sticking with target/ maybe in their own special target/lints/ seems less surprising?

Yes, but that would also be the case if cargo lint Just Works(tm). Meaning, that it will set up everything automatically and then run the lints.

If we add field to Cargo.toml listing the lint crates that should be used when this crate is depended on

# crate abc
# normal Cargo.toml stuff ...
[lints-used-downstream] # better name
lint_abc = "0.1"
third_party_lint_abc = "0.2"

chorman0773 commented 3 years ago

It may be a good idea to leave lints to the control of the top-level crate(s). After all, a user may not want to run some downstream lint on their code, or an upstream crate could break a downstream crate by adding a linter that has a deny-by-default or forbid-by-default lint. Opting downstream users into lints is a stability footgun.

As a related note, having [build-lints] (Bikesheddable) for linters to run on build scripts would also be useful, and allows projects to have slightly different rules between the build script and the main project.

jhpratt commented 3 years ago

Not necessarily. We could compile them into $HOME/.cargo instead of the target/ dir in the project.

Certainly possible. I'd have to check but this might cause some minor issues (for lack of better phrasing) when different versions are compiled. As in they'd have to be recompiled as one would overwrite the other. Again I'm not sure on this, though.

Yes, but that would also be the case if cargo lint Just Works(tm). Meaning, that it will set up everything automatically and then run the lints.

This would also be the case with compiling the lints JIT (iff no prebuilt lint packages are found in $HOME/.cargo/ or wherever).

Is it truly automatic and transparent if you still have to add lint crates to a pseudo-dependency list?

This is independent of the crate name that it is published under. We can't prevent people from namespacing their lints under whatever name they want.

Well, we could by having a registry. I think it's important to know that if the namespace is the same as a crate name, it's from the same people.

We would have to build a complete registry for this. I really don't want to reinvent the wheel here. Also there are security concerns. Some people already complain that cargo just automatically downloads the source code from crates.io and builds it out of concerns of security. Downloading prebuilt binaries is a really hard sell IMO. Even if the source code is public, I couldn't be sure what's in the binary.

KISS: If we start to write a registry, then search some place where we can host it, then have to efficiently store pre-built binaries there for multiple targets per linter (and possibly multiple versions), we would be busy for the next half year just writing this registry before we could even start with the actual project of making it easy to write lints. So we should keep it simple and reuse as much of the infrastructure that already exists. And this is crates.io and cargo, where we get the infrastructure, versioning etc. for free.

It should be possible to fork crates.io and keep the relevant parts, no? I'm not saying it would be trivial but it would certainly be easier than building it from scratch.

With regard to security concerns, that's why I suggested sandboxing — the only thing that would have to be trusted is the cargo-lint tool, as the lint implementations couldn't do anything nefarious even if they tried.

KISS for users: Since 2015 Rust users run cargo build and this will download the crates and compiles them. We shouldn't introduce new things to the users if not absolutely necessary. Only shipping prebuilt binaries from a new registry would definitely be a new thing, where I personally don't see the necessity.

While I understand what you're saying, I think having it be fully transparent would eliminate any sense of "new" to the end user. Imo it should be just as transparent as clippy is.

Currently there are 4 people actively participating in the discussions in this org/repo. So we don't have the resources to build every part of the infra from scratch. We must reuse as much as possible in order to make any progress here.

Certainly, and I intend to. Like I said above it's probably possible to fork crates.io, which would eliminate much of the time spent on a registry.

It may be a good idea to leave lints to the control of the top-level crate(s). After all, a user may not want to run some downstream lint on their code, or an upstream crate could break a downstream crate by adding a linter that has a deny-by-default or forbid-by-default lint. Opting downstream users into lints is a stability footgun.

Good point. Direct dependencies only makes most sense.

Does what I'm proposing at least conceptually make sense to everyone? If not I can write things up a bit clearer and all in one place.

chorman0773 commented 3 years ago

Is it truly automatic and transparent if you still have to add lint crates to a pseudo-dependency list?

Well, it's the same as having to install each lint via cargo install/rustup/similar. Plus it gives project-level control over the lints. If I've got 50 different lint packages installed, chances are I don't want to use all 50 for every rust project I have. I don't even use clippy for half of them rn.

With regard to security concerns, that's why I suggested sandboxing — the only thing that would have to be trusted is the cargo-lint tool, as the lint implementations couldn't do anything nefarious even if they tried.

Well, that would require the (rust) implementation to support the sandboxing, which, as I mentioned, may be an unreasonable burden. linters should probably, though, be held to the same rules as proc-macros, but it can't really be enforced unless the implementation chooses to support the machinery to sandbox lints, which can easily be valid from the beginning.

jhpratt commented 3 years ago

What I was picturing was not having all lints executed, only those of direct dependencies. So if you had time as a direct dependency, the lints associated with time in the global cache would be executed. Additional lints could still be executed as requested by a field in Cargo.toml.

So far the primary thing I've seen mentioned as to why sandboxing might be an issue is shipping pre-compiled dylibs. I've mentioned elsewhere that it would still be possible to query the feature flags enabled via a provided API, which covers the behavior I believe you desired (changing a lint based on whether a feature on the library is enabled). The only other issue I've seen brought up was that this would require an interpreter shipped, but as this would be a standalone binary (which seems most logical), that's not a huge deal imo.

chorman0773 commented 3 years ago

What I was picturing was not having all lints executed, only those of direct dependencies. So if you had time as a direct dependency, the lints associated with time in the global cache would be executed.

What if I don't want all of those lints to be executed? I could easily have a bunch of lints assoicated with time that I'm using in some other project, but not for the current one.

The only other issue I've seen brought up was that this would require an interpreter shipped, but as this would be a standalone binary (which seems most logical), that's not a huge deal imo.

My note was that if you compile it once, then you wouldn't be able to access information that's implementation-specific. Additionally, it wouldn't work if the implementation had (unstable) extra syntax that the compiler the linter was built against can't parse (potentially within macros that get expanded). There is also the question of how it would query the standard library, or any third party library for that matter. Would it use the standard library of the compiler it was built against? The standard library in use with the package? If it's using libraries built by a different compiler than the linter was built against, it likely won't know how to open it. I mentioned this in a distinct IRLO thread (related to -Z build-std), but anything linked against rustc_private will be very suprised when it cracks open an .rlib produced by lccc and tries to read !/.rmeta (lccc produces a different file in a very different format, called .rmanifest). So no matter what, the linter will need to interact with the compiler in use in some meaningful way. The easiest way to do this is to let the compiler decide how to build and execute user-defined lints, which would require shipping an interpreter iff wasm was mandatory for such.

jhpratt commented 3 years ago

What if I don't want all of those lints to be executed? I could easily have a bunch of lints assoicated with time that I'm using in some other project, but not for the current one.

That's where I want to guarantee that a lint in the time:: namespace is provided by the author of the time crate (in this case me). If you have your own set of lints, they would be included via Cargo.toml in a custom section.

then you wouldn't be able to access information that's implementation-specific

Why not? The linter is able to export functions that can be imported and called by the lint implementation.

it wouldn't work if the implementation had (unstable) extra syntax that the compiler the linter was built against can't parse (potentially within macros that get expanded)

Not sure what you mean here. If you're referring to the raw TokenStream inside the macro call, that would presumably be provided verbatim (along with the expansion) to the lint implementation.

There is also the question of how it would query the standard library, or any third party library for that matter. Would it use the standard library of the compiler it was built against? The standard library in use with the package? If it's using libraries built by a different compiler than the linter was built against, it likely won't know how to open it. … So no matter what, the linter will need to interact with the compiler in use in some meaningful way.

That's the purpose of the stable API. The linter would be parsing the code itself and converting it to what is effectively an IR.

chorman0773 commented 3 years ago

Why not? The linter is able to export functions that can be imported and called by the lint implementation.

Implementation here means rust implementation. I cannot, for example, from a generic implementation of a linter, query the layout of a repr(Rust) structure.

Not sure what you mean here. If you're referring to the raw TokenStream inside the macro call, that would presumably be provided verbatim (along with the expansion) to the lint implementation.

In order to provide type information, the linter would have to expand and parse the macro. However, if, on a particular library implementation it expands to something like lccc's k#__asm, a lint compiled against the rustc libraries will have zero clue what to do with it. This also is applicable when the syntax occurs directly - in lccc's stdlib, I use k#__xir frequently to generate particular code sequences when it would be overkill to define an intrinsic.

That's the purpose of the stable API. The linter would be parsing the code itself and converting it to what is effectively an IR.

This doesn't solve the problem, especially wrt. the stdlib. The linter does not know how to find the standard library source code, or what it needs to do in order to build all of it, unless it asks the compiler (See: https://internals.rust-lang.org/t/how-stdlib-is-found-by-cargo-z-build-std/14955/16). And even if it does know or determine that information, the same issue above arises, it may not be capable of parsing everything the standard library contains, if said standard library is using (or overusing) implementation-specific unstable features.

jhpratt commented 3 years ago

Implementation here means rust implementation. I cannot, for example, from a generic implementation of a linter, query the layout of a repr(Rust) structure.

What I'm saying is that the linter can provide a method that does exactly that.

However, if, on a particular library implementation it expands to something like lccc's k#asm, a lint compiled against the rustc libraries will have zero clue what to do with it. This also is applicable when the syntax occurs directly - in lccc's stdlib, I use k#xir frequently to generate particular code sequences when it would be overkill to define an intrinsic.

It sounds like your macro would expand to use a compiler-internal function? That function still has to have a header that provides type information, no? If it's expanding to syntax that has meaning to only your compiler, then you really shouldn't be doing that in the first place imo.

And even if it does know or determine that information, the same issue above arises, it may not be capable of parsing everything the standard library contains, if said standard library is using (or overusing) implementation-specific unstable features.

So long as it's valid Rust syntax, it would be able to be handled. The linter would have its internal parser updated on a regular basis to match that used for rustc.

DevinR528 commented 3 years ago

The linter would have its internal parser updated on a regular basis to match that used for rustc.

We aren't writing a parser/type DB/whatever else would be needed, right? We'll just wrap rustc_lint in a future proof super stable API? :worried:

chorman0773 commented 3 years ago

It sounds like your macro would expand to use a compiler-internal function? That function still has to have a header that provides type information, no? If it's expanding to syntax that has meaning to only your compiler, then you really shouldn't be doing that in the first place imo.

It's not a function. k#__asm is the implementation of of the asm! macro in lccc, so I don't need to attach the macro engine to the xlang-ir visitor api. It's an unstable feature (lccc_asm_keyword), that the macro enables internally. k#__xir is similar (it produces the compilers mid-level ir), though I forwent the macro, and just use it directly inside the stdlib. If I wrote an intrinsic to replace every single use of direct xir generation, I'd have upwards of 100 intrinsics that expand to one xir expression, which is far more complicated, implementation-wise, then just adding an unstable syntax extension available to the standard library, which is similar to something I already do because actually implementing asm! as anything else would require giving considerable ammounts of context to the macro engine.

So long as it's valid Rust syntax, it would be able to be handled. The linter would have its internal parser updated on a regular basis to match that used for rustc.

Which is the point I'm making => it's not. Plus, again, there is the issue of finding the source code.

chorman0773 commented 3 years ago

What I'm saying is that the linter can provide a method that does exactly that.

It can, sure, but without knowledge of the precise compiler in use, it would easily be wrong. There is no implementation-independent query that can determine the layout of a repr(Rust) type, because it's unspecified. In order to actually determine the proper layout, it necessarily needs to know the compiler in use.

#[repr(Rust)]
pub struct Foo{
    f: u32,
    f: u64
}

Could have size 12, 16, 32, or 42 (assuming alignment of 2) depending on the compiler and applicable flags.

jhpratt commented 3 years ago

We aren't writing a parser/type DB/whatever else would be needed, right? We'll just wrap rustc_lint in a future proof super stable API? worried

Correct. But rustc_parse (or whatever it's called) would still be updated. It's an implementation detail of the linter.

Which is the point I'm making => it's not.

Why should the linter accommodate something that uses nonstandard extensions? I'm not aware of a single instance of rustc doing this, though I could just be ignorant to that fact. Should the linter also accommodate a syntax that doesn't require parentheses and braces to be balanced? I think we can agree that it shouldn't need to.

there is the issue of finding the source code.

Finding the code to analyze stdlib is a solvable issue. I have previously read through your post on IRLO (I follow IRLO closely).

It can, sure, but without knowledge of the precise compiler in use, it would easily be wrong.

Valid point, and it's a decent argument towards a stable ABI (even if it's not the default ABI).

For what purpose would you need to know the actual layout of a type? Wouldn't just knowing that the end user is transmuting (or whatever else) be sufficient to catch their relying on an unstable detail? I'm not disputing a potential use case exists; I sincerely can't think of one off the top of my head.

chorman0773 commented 3 years ago

Why should the linter accommodate something that uses nonstandard extensions? I'm not aware of a single instance of rustc doing this, though I could just be ignorant to that fact. Should the linter also accommodate a syntax that doesn't require parentheses and braces to be balanced? I think we can agree that it shouldn't need to.

Arguably, it should not, except for a lint specifically designed for that. However, as mentioned, k#__asm is an implementation-detail of the asm! macro, which, while not stable, is standard. Assuming it has access to the correct standard library (see everything else following), it would expand the asm macro using the definition provided, encounter the non-standard construct, and have zero idea what to do. Thus it would need to know how to handle the extension if it's using that standard library.

I'm not aware of a single instance of rustc doing this, though I could just be ignorant to that fact

I don't know what rustc does, but the way my system works, the macro needs to expand to something. The reserved syntax proposal gives me the tool I needed, so I used it. Sure, the stdlib is using non-standard constructs, but it's entitled to do so, after all, it has direct knowledge of the compiler in use, and can flip whatever feature gates it wants (which is also potentially something a linter would need to consider).

Finding the code to analyze stdlib is a solvable issue. I have previously read through your post on IRLO (I follow IRLO closely).

It is, but then knowing how to build it is another story. On lccc in particular, you'd have to execute one of the three build scripts (configure, CMakeLists.txt, or build.rs: they are interchangeable, though) for anything other than core and alloc, and resolve it's build dependencies, etc. So this would require basically running cargo. If the compiler itself was the lint driver, however, it would make the code much simpler because it can just run cargo rustc probably.

For what purpose would you need to know the actual layout of a type? Wouldn't just knowing that the end user is transmuting (or whatever else) be sufficient to catch their relying on an unstable detail?

Transmute is only one way to rely on the layout. Actually having the layout of types could be useful, for example, in determining when users may be relying on it for abi purposes.

jhpratt commented 3 years ago

encounter the non-standard construct, and have zero idea what to do

That's why non-standard constructs are a bad idea. While a different compiler implementation is free to do whatever they want, they shouldn't expect the same level of support if they stray from what is in rustc.

the macro needs to expand to something

Yes, and it could expand to intrinsics that are typed. Just because it's verbose doesn't mean it couldn't do it.

The reserved syntax proposal gives me the tool I needed, so I used it.

That's actually not what the proposal is for in any way. The proposal was to effectively allow the introduction of new keywords without an requiring an edition, nothing more. Now, I haven't bothered to dig through the relevant threads, but I'd be willing to bet no one brought up the possibility of a third-party compiler (ab)using their own keywords.

Transmute is only one way to rely on the layout. Actually having the layout of types could be useful, for example, in determining when users may be relying on it for abi purposes.

You don't need to know the final layout for this, just that it's being used in an extern context or dylib. Keep in mind that the final layout could vary even if it were integrated into the compiler because of things like PGO.

Personally, I don't think we should make it an explicit goal to support compilers other than rustc. If it happens to work, great! If they decide to use non-standard constructs, we shouldn't be required to support that in any way. rustc is the compiler that 99+% of people use — focusing our efforts on supporting that is what we should be doing imo. I understand that you have personally worked on an alternate compiler, but please don't let that clout your judgement.

rdrpenguin04 commented 3 years ago

The official implementation shouldn't dictate what every other implementation must do; otherwise, you don't have other implementations, you have copies of rustc. One major point of having multiple implementations is having an implementation available that does something in a sane manner (e.g. if rustc has a bug, a programmer should be able to switch which implementation they are using while rustc fixes the bug; this has given me personally an out to not support MSVC when it is non-compliant and GCC and Clang are compliant).

Additionally, forcing alternate compilers to implement things in a certain way may require reframing the infrastructure of the compiler to match what rustc has. That is, again, counterproductive; if every compiler were rustc, there may as well only be one compiler.

Saying that someone's judgement is being clouded by experience with an idea is silly imo.

chorman0773 commented 3 years ago

In any case, the benefits of just treating lints similar to proc macros are:

The implementation that is building the code is also the thing running the lint driver, thus it will have access to layout information (the flags will also match assuming people are matching RUSTFLAGS), and can parse the constructs available, negating the problem in the first place. It will also have direct access to the compiled standard library, and won't have to build it (which may involve doing cargo things), and if -Z build-std is in use, it will be able to use whatever protocol is decided upon to proceed with that.
The interface will be able to peak inside macros, including proc-macros, which is basically required for complete type information. Absent this, the linter would be required to either rebuild all proc-macro dependencies, or forgo macro expansion, losing access to any type defined by a proc-macro at least.
Distribution can be either over crates.io, or a similar registry.
Operation of lints can be done simply, by acting as a thing wrapper arround cargo rustc

Downsides:

Lints are not "single driver", and require being built, though this process can be automated by cargo.
Lints are not guaranteed to be sandboxed, though implementations can elect to sandbox lints, for example, by building them for wasm.
Using lints with other build systems would require extra effort. similar to what is neccessary for proc-macros
Lints will not work with compilers that don't have the necessary features. However, it is possible we could support nightly rustc with a wrapper (converting lint crates to rustc plugins).

jhpratt commented 3 years ago

The official implementation shouldn't dictate what every other implementation must do

When there is otherwise no standard, yeah it pretty much does. rustc is the reference implementation of Rust. Surely you're not disputing this?

Saying that someone's judgement is being clouded by experience with an idea is silly imo.

cloud != clout. I believe that undue influence is being given to something that is purely theoretical and will almost certainly remain so for years; there is no complete implementation of Rust other than rustc. The closest is mrustc, and while impressive it remains a long ways from rustc.

thus it will have access to layout information

I've still yet to see why this is necessary to detect reliance on an unstable ABI. All you need to know is that the ABI is being assumed without a #[repr(foo)] other than Rust. This could be provided without much issue.

The interface will be able to peak inside macros

Why do you think compiler integration is required to be able to expand macros?

Operation of lints can be done simply, by acting as a thing wrapper arround cargo rustc

I'm not sure what you mean by "simply" — the end result is a binary, regardless of if it's integrated into the compiler or not. Clippy is a standalone tool and doesn't have any issue along the lines of what you seem to be hinting at.

Lints are not guaranteed to be sandboxed, though implementations can elect to sandbox lints, for example, by building them for wasm.

By permitting the implementation of the lint driver to choose if it's sandboxed or not, we'd be fragmenting the ecosystem, as an unsandboxed lint may not work in a sandbox (the reverse is not true).

Using lints with other build systems

Meaning?

Compiler integration is not something to be done lightly. There are tools that are universally used such as cargo-edit yet they are not integrated into Cargo. serde is not integrated into the standard library. Why should this tool be integrated into the compiler immediately? It is a large project that can easily go wrong; we should not burden the compiler with the upkeep of a project that may well fail, let alone the technical debt that goes along with it.

I have an open question regarding the potential for alternate compilers that use nonstandard syntax:

Should the linter also accommodate a syntax that doesn't require parentheses and braces to be balanced? I think we can agree that it shouldn't need to.

This is fundamentally the same question as what you're saying is done by your (early stage) compiler — using nonstandard syntax but expecting full support from all tools.

As I have stated multiple times now, I believe unnecessary weight is being given to the potential for alternative compilers being developed in the future. As far as I'm aware, not one person uses anything other than rustc as their daily compiler. This will almost certainly remain the case for years to come. I see no reason to support alternate implementations that are not only theoretical but explicitly use nonstandard constructs, going against the manner in which the reference implementation handles syntax and types.

By all means, please address my points one-by-one and explain why and how I'm wrong. With all respect, I sincerely doubt you can do so.

flip1995 commented 3 years ago

Regarding how lints should be provided: The last point made was that they should be automatically be shipped with the crate they're implemented for. In addition still the question if we should build a registry for it.

Regarding the registry: I'm open to think about this. But I wouldn't start with it. We must provide a way to install and use the lint packages from source anyway (because many people and companies will prefer this method). So I would just start with that for now and push the implementation of a registry into the future. Also I don't see how we should handle storing multiple binaries for different feature/target-host/version/... sets per crate. This will just explode.

So my suggestion is to start with using crates.io and compile lints for now, and maybe build an own registry for lints in the future. But not to start with a new registry

Regarding implicitly shipping lints: I don't like this. Also for the reason others mentioned. I may not want to run lints for some crates. For example, if I depend on a crate for only one or two utility functions, I don't necessarily need to run the full lint package on my crate. There are good reasons why I don't want to run lint packages. So we would need a exclude list of lint packages. This is why I would just make it explicit, dependency-like.

In the far future, if this gets integrated closely with cargo/rustc, we could add a flag to dependency definitions in the Cargo.toml where you can just write something like:

[package]
...
lints = true  # Automatically install and run lints from the crates I depend on

[dependencies]
crate_1 = { version = "1.0", lints = false } # But don't use lints from this crate.
crate_2 = { version = "2.0", lints = true } # Or enable lints one by one if the global `lints` flag is not set.

But for now, I would like to make this explicit. Implicitly installing binaries without the user knowing which crates are doing this, seems like a thing people will dislike.

Personally, I don't think we should make it an explicit goal to support compilers other than rustc.

+1. KISS. Otherwise we won't get anything done.

I wouldn't make this a requirement for this project. There are no other fully functional compilers besides rustc now and won't be in the next few years. In addition to that, a stable API towards the user is already hard enough. Making this API also compiler independent is practically impossible.

e.g. if rustc has a bug, a programmer should be able to switch which implementation they are using while rustc fixes the bug

This only really matters for compilation, not so much for static analysis. If you want to use a different compiler, because that compiler supports a target that rustc doesn't, than you can do this for compiling but for the static analysis you can use rustc. Static analysis doesn't depend on the target and no binary is produced. So all possible bugs are compile time bugs, which – while annoying – aren't as catastrophic as runtime bugs can be.

So I wouldn't think about supporting compilers other than rustc for now. The only way I see we would be able to do this would be to dump information about the compilation to a specified file format and then deserialize this in the lint driver. And then every compiler would have to implement a dump to that format.

chorman0773 commented 3 years ago

When there is otherwise no standard, yeah it pretty much does. rustc is the reference implementation of Rust. Surely you're not disputing this?

There is a difference between the public behaviour of the implementation and the private behaviour.

Why do you think compiler integration is required to be able to expand macros?

proc-macros need to be compiled to be expanded. Something precompiled cannot rely on the abi of a compiled proc-macro crate, even if we only target rustc (especially so, since rustc's abi is deliberately unstable, but other implementations may provide a stable or semi-stable abi).

If you want to use a different compiler, because that compiler supports a target that rustc doesn't, than you can do this for compiling but for the static analysis you can use rustc.

Someone may want to use a different compiler because they want to, however, even accepting this as true, it still doesn't follow; static analysis is target-specific, because compiling rust is target specific. rustc cannot expand cfg directives for a target it cannot compile to. Further, rustc cannot be used if you want support for a different host target rustc doesn't support, because at that point, you cannot run rustc at all.

The only way I see we would be able to do this would be to dump information about the compilation to a specified file format and then deserialize this in the lint driver.

Or to compile it with the compiler, linking against that compiler's version of the lint api, and executing it according to the compiler.

flip1995 commented 3 years ago

Someone may want to use a different compiler because they want to

Then I would say that our goal is not to support every use case out there. The decision to use another compiler comes with not being able to dip into the Rust ecosystem. Since there is only one established ecosystem, I want to cater to that and not to theoretically new rust compilers that may or may not be feature complete in the future. (I'm saying this as one of the people that are closely involved with the GCC-Rust efforts).

If you want to use an alternate compiler for compilation and this API + rustc for static analysis, there are ways to do this with cfg_attr and the like.

Or to compile it with the compiler, linking against that compiler's version of the lint api, and executing it according to the compiler.

I don't see how we could design for something like this. We have to call directly into the compiler to get information. With that, we would have to design 2 stable APIs. One (semi-)stable directly tied to the compiler, which the compiler has to maintain. This will slow down compiler development, since it introduces stability guarantees to the compiler. And one that interfaces between the user and the compiler API. One goal here is, that rustc doesn't have to care that this stable API exists, since every breakage is resolved internally by the API, without changing the public interface. Your suggestion would probably require a compiler (not language!) specification that specified what functions a full Rust compiler would've to provide.

chorman0773 commented 3 years ago

I don't see how we could design for something like this. We have to call directly into the compiler to get information. With that, we would have to design 2 stable APIs. ... Your suggestion would probably require a compiler (not language!) specification that specified what functions a full Rust compiler would've to provide.

Actually, it would simply require specifying the api presented to consumers. It then places the implementation on the hook for providing that api if it wants to support lints, and this project can implement that api for rustc. This is similar to what proc-macros do, and would have to be done anyways if lints are being built against the rustc libraries (also, similar to the following point, what is done if rustc's libraries aren't available). This is necessary anyways, if you want to support expanding proc-macros, which, as I have mentioned, are necessary to obtain type information, as they will require asking the compiler how to expand (lest you run into "What the heck is this abi" problems, even on rustc). The api will be built off of the unstable api of rustc, and have to be updated with it like dylint and clippy already are, but the lints using the api would not.

If you want to use an alternate compiler for compilation and this API + rustc for static analysis, there are ways to do this with cfg_attr and the like.

In the first case of "just wanting to for some reason", I can't see why this is necessary, except to handle the standard library. In the cases it is necessary, you'd need a way to detect the compiler in use For unsupported targets, I'm not sure how you'd do this with the linter. It would have to do one of the following:

Spoof cfg(target_*) queries
Set a special cfg value for the linter, like rustdoc does, and it would have to handle undefined interior types like rustdoc. It would also have to handle uses of the asm! macro that are malformed on whatever target is used. In either case, it will also have to deal with code generated by build scripts and proc-macros, that may not be capable of functioning on whatever target the linter is running on.

For rustc unsupported hosts, or where using rustc is avoided for practicality reasons (for example, where cross-compilation is not possible due to incorrectly made assumptions), using lints the way you propose would definately either be impossible, or necessarily use the api provided by the implementation that is available.

The decision to use another compiler comes with not being able to dip into the Rust ecosystem. Since there is only one established ecosystem,

One thing I've seen is that people are scared that multiple implementations will fracture the ecosystem. I have the same fear and am trying to work now so that when the implementations are available in 1 year, 3 years, or 10 years, it won't. The way you do that, however, is not by adapting other implementations so that their internals match rustc, it's by reducing the reliance on those internals in the first place.

chorman0773 commented 3 years ago

Another point, by a 3rd party library using the rustc_private feature directly, it would have to be nightly only (and don't say RUSTC_BOOTSRAP that's not a solution that's an internal, very unstable, hack). However, if the library is tied to the compiler, then the compiler's build process can also build that library according to it's rules for the rest of the stdlib, just like how libproc_macro is.

flip1995 commented 3 years ago

Actually, it would simply require specifying the api presented to consumers. It then places the implementation on the hook for providing that api if it wants to support lints, and this project can implement that api for rustc.

I think we agree here but just have different formulations for the same thing. Since I think we will heavily use the queries that are available in rustc to build this API upon, the interface for other compiler implementations will probably just be the functions we use from rustc. At least at first. The backend of this API can be changed anytime to improve it for other compilers.

I see your points about other compiler implementations. But I think we shouldn't put much thought into that right away, but still keep it in the back of our mind.

Another point, by a 3rd party library using the rustc_private feature directly, it would have to be nightly only

Yes, but since we have to develop a first version of this API out-of-tree, we probably have to use rustc_private. And yes RUSTC_BOOTSTRAP is not an option. So our goal should be that we some time in the future can build and ship this API with the Rust compiler, so you can write your lints with a stable compiler.

chorman0773 commented 3 years ago

Since I think we will heavily use the queries that are available in rustc to build this API upon, the interface for other compiler implementations will probably just be the functions we use from rustc. At least at first. The backend of this API can be changed anytime to improve it for other compilers.

The latter is my point, though more from a perspective of "if an implementation other than rustc wants to support the api, it must provide it in some way" (which may be to provide the interface rustc does, or that is sufficiently similar to function, and use the out-of-tree version). I do agree that this project should not be designed arround the potential unknown design details of alternative compilers, but I argue it should not be designed in a way that excludes them without (potentially significant) internal design changes to fit the interface rustc has (which, notably, is written in rust and is unstable). This is why I would propose a specification along with the MVP, just like the proc-macro api, and to allow the compiler to choose how to build and execute lints, would allow each implementation to provide the feature in the way most tailored to it's own internal architecture. I would be willing to write the prose for that specification.

jhpratt commented 3 years ago

We must provide a way to install and use the lint packages from source anyway (because many people and companies will prefer this method)

Having path dependencies for lints is more than reasonable, and imo expected if I haven't stated that before.

So I would just start with that for now and push the implementation of a registry into the future. Also I don't see how we should handle storing multiple binaries for different feature/target-host/version/... sets per crate. This will just explode.

So my suggestion is to start with using crates.io and compile lints for now, and maybe build an own registry for lints in the future. But not to start with a new registry

That's where I think letting the lint implementation query the enabled features, target, etc. is the sensible way forward.

At least at first, we really don't even need to use crates.io — manual installation of lints would be sufficient during the design and implementation phase of this. A registry and/or crates.io integration would be among the last things to be worked on either way, no? So I almost feel like we can just punt this decision down the road.

Regarding implicitly shipping lints: I don't like this. Also for the reason others mentioned. I may not want to run lints for some crates. For example, if I depend on a crate for only one or two utility functions, I don't necessarily need to run the full lint package on my crate. There are good reasons why I don't want to run lint packages. So we would need a exclude list of lint packages. This is why I would just make it explicit, dependency-like.

I would consider it highly unidiomatic if a crate produced a lint not related to its own code. smallvec's isn't going to check if your transmutes do something weird — why would it? It could lint something like let foo = SmallVec::new(); foo.push(1); foo.push(2);, suggesting ::with_capacity(2) or something else. As such if you import itertools for its extension methods, you shouldn't expect arbitrary things to be linted. If that happens, you could disable the itertools crate's lint as a whole as I suggested.

In the far future, if this gets integrated closely with cargo/rustc, we could add a flag to dependency definitions in the Cargo.toml where you can just write something like:
[package]
...
lints = true  # Automatically install and run lints from the crates I depend on

[dependencies]
crate_1 = { version = "1.0", lints = false } # But don't use lints from this crate.
crate_2 = { version = "2.0", lints = true } # Or enable lints one by one if the global `lints` flag is not set.
But for now, I would like to make this explicit. Implicitly installing binaries without the user knowing which crates are doing this, seems like a thing people will dislike.

That would be nice in the long term. For the initial, non-rustc-integrated implementation, perhaps something like the following.

[package.metadata.lints]
auto-detect = false # true would eliminate the need for `foo` and `bar` below if they are already dependencies

[package.metadata.lints.crates]
foo = "0.2.0"
bar = "0.16.0"
_internal = { path = "..." }

Given that I'd like to maintain the parity between crates.io crate names and lint crate names, I'd prefer to have the names listed here be the same as the crate name. I understand that it can be quite counterintuitive if a user is looking for the foo lint and instead has to look for foo__lints or whatever on crates.io, but I think this is something that can be pushed down the line a bit; path dependencies for lints should be what's implemented at first. Internal names should, in my opinion, start with an underscore, but that could cause issues with local development of a published lint crate. I suppose some environment variable could be read to override that, but that seems a bit hacky. Basically I know what I'd like to have for names, but I understand that there are significant drawbacks to what I'm picturing (that can't really be resolved without either a separate registry or cargo integration). As I've said these decisions don't really need to be made up front though.

There is a difference between the public behaviour of the implementation and the private behaviour.

You're suggesting different syntax. Please address the question regarding unbalanced parentheses that I have posed. It's fundamentally the same question. Until that time I will not be responding to any points that are similar. I have thoroughly explained my objection to supporting nonstandard constructs in previous comments.

proc-macros need to be compiled to be expanded. Something precompiled cannot rely on the abi of a compiled proc-macro crate, even if we only target rustc (especially so, since rustc's abi is deliberately unstable, but other implementations may provide a stable or semi-stable abi).

That just means the lint tool needs to include the ability to expand macros rather than relying on a separately-provided library, nothing more.

Then I would say that our goal is not to support every use case out there. The decision to use another compiler comes with not being able to dip into the Rust ecosystem. Since there is only one established ecosystem, I want to cater to that and not to theoretically new rust compilers that may or may not be feature complete in the future.

So our goal should be that we some time in the future can build and ship this API with the Rust compiler, so you can write your lints with a stable compiler.

Agreed 👍🏽

For unsupported targets, I'm not sure how you'd do this with the linter.

If they're not supported, that's all you need to know. Tough luck.

chorman0773 commented 3 years ago

You're suggesting different syntax. Please address the question regarding unbalanced parentheses that I have posed

Different syntax used internally to satisfy a publicly specified requirement of the standard library. rustc adds new syntax under feature gates all the time, this isn't particularily different (except that no intent to stablize exists). In fact, I would argue that, given it's likelihood of being stablized, box_syntax from rustc is not any different from this (though that isn't even required to implement Box::new which is where it's used). As for unbalanced paretheses, this is already excluded by the requirements of proc-macros, and that is reasonable. In the case I provided, nothing should currently be broken by the expansion, as the lccc macro engine will be capable of handling it as token tree (and the result of expanding the asm macro is unspecified, so matching it with a decl-macro or proc-macro with anything other than $($tt:tt) is not guaranteed to function anyways). As mentioned, I have to expand the macro to something, and that something* must be capable of supporting the vast array of requirements of the asm! macro defined in RFC 2873, which cannot be described in any standard construct of the rust language, beyond the macro itself. Thus, given that the internals of the compiler require the macro to expand to something, and that something cannot be a standard construct, the only alternative is to define and use a non-standard construct. RFC 2873 defines the existance of the asm! macro in core (though currently unstable). Thus, I must provide it. Enjoining the only reasonable avenue I have within the system as it is designed of providing it is unreasonable, and a prohibition against that cannot stand in the face of the requirement of that.

If they're not supported, that's all you need to know. Tough luck.

Targets may not be supported in upstream rustc for a variety of reasons, including that it hasn't yet been added, or there may not be someone able to maintain the target on rustc, but a larger potential reason is that llvm doesn't support it (mitigated by rustc_codegen_gcc). It seems unreasonable that the only valid targets are the ones defined by rustc, as there are a vast majority more, and those will be supported by other implementations, especially gcc-rs. I'm working on rust for the snes, and I'd really prefer if my diagnostics worked properly with that, especially since I'd want to use this api to write a few myself.

jhpratt commented 3 years ago

box is still planned to be stabilized eventually™ so far as I can tell. It's a matter of figuring out placement new. Is it possible that the syntax eventually gets removed instead of stabilization? Certainly, but that's happened with other unstable features as well.

I quickly read through parts of the PR that implemented the new asm! macro. I presume this is similar to the current implementation. I'm not a compiler expert by any means, but it did not appear that the macro expanded to anything, but rather that it is parsed directly into HIR. Again, this was only after quickly reading through parts, so I'd be happy to be shown that I'm wrong.

For targets, I understand that there are targets not supported by rustc. As @flip1995 said, supporting non-rustc compilers isn't the primary goal here, just that we should keep the door open for down the road. I don't see how limiting the targets for now would break that promise.

flip1995 commented 3 years ago

A registry and/or crates.io integration would be among the last things to be worked on either way, no? So I almost feel like we can just punt this decision down the road.

Totally agree.

That would be nice in the long term. For the initial, non-rustc-integrated implementation, perhaps something like the following.
[package.metadata.lints]
auto-detect = false # true would eliminate the need for `foo` and `bar` below if they are already dependencies

[package.metadata.lints.crates]
foo = "0.2.0"
bar = "0.16.0"
_internal = { path = "..." }

Now make it so that the default is auto-detect = false and we're completely on the same page. :smile:

Given that I'd like to maintain the parity between crates.io crate names and lint crate names, I'd prefer to have the names listed here be the same as the crate name. [...] path dependencies for lints should be what's implemented at first.

This would be really nice but would require a own registry and/or cargo integration. So I agree to push this further down the road and start with the path deps. Once we get to a point where this is RFC ready and we are allowed to work on integrating it into cargo/crates.io, we can revisit this issue.

I don't see how limiting the targets for now would break that promise.

rustc has the concept of differentiating between host and non-host targets. Our goal should be to support all Tier 1 host targets and most of the Tier 2 host targets, meaning that those targets should be able to run the linter. But yeah, we should probably not care as much about multi target support at first. As long as iOS, Linux and Windows works, we should be good.

@jhpratt I think we're pretty much on the same page now, except for a few details, where I think we'll need community input on those anyway. :+1:

DevinR528 commented 3 years ago

I think our Cargo.toml looks pretty good also!

[package.metadata.lints]
auto-detect = false # true would eliminate the need for `foo` and `bar` below if they are already dependencies

[package.metadata.lints.crates]
foo = "0.2.0"
bar = "0.16.0"
_internal = { path = "..." }

jhpratt commented 3 years ago

Now make it so that the default is auto-detect = false and we're completely on the same page.

Stylistic preference 🤷🏽 Realistically it's a whopping one line, so it's not like I'm hard-set on this.

This would be really nice but would require a own registry and/or cargo integration.

Technically no, because the name visible in Cargo.toml doesn't necessarily have to match the name on crates.io…it's hacky at best admittedly. But let's defer this to later.

I think we're pretty much on the same page now, except for a few details, where I think we'll need community input on those anyway.

Agreed. I think the main issue was my looking farther down the road without thinking about what has to happen to get there.

jhpratt commented 3 years ago

Anyone mind summarizing what was agreed upon here, both short and long term?

xFrednet commented 3 years ago

See #10