rust-lang / rust-analyzer

A Rust compiler front-end for IDEs
https://rust-analyzer.github.io/
Apache License 2.0
14.28k stars 1.61k forks source link

dev-dependencies on workspace member cause cyclic dependency issues #14167

Open Veykril opened 1 year ago

Veykril commented 1 year ago

Note: We (rust-analyzer) consider this issue resolved in terms of what we can do here.

This issue is spread across several others, so let's make one big one out of it to better track it.

https://github.com/rust-lang/rust-analyzer/issues/2414#issuecomment-561213070

Yeah, the problem is that a crate can have a dev-dep onto itself:

[package]
name = "foo"
version = "0.1.0"
edition = "2018"

[dev-dependencies]
foo = { path = "." }

The solution here is to start "duplicating" crates when lowering cargo-metada output to CrateGraph. In the above example, we should construct two foo crates, one with cfg(test) and one without, and make the required depednency edge.

Issues in question: https://github.com/rust-lang/rust-analyzer/issues/8330 https://github.com/rust-lang/rust-analyzer/issues/2414 https://github.com/rust-lang/rust-analyzer/issues/12407 https://github.com/rust-lang/rust-analyzer/issues/9574 https://github.com/rust-lang/rust-analyzer/issues/11410

GilShoshan94 commented 1 year ago

Is it the same issue for crates that export proc_macro with other crates ?

I have a main crate foo, and another one foo-derive with the proc_macro. I wanted to test foo-derive inside itself and to not have the tests moved to foo for the derive macro only. But I need foo in the tests since the generate code used structs and traits from foo public API.

In foo-derive Cargo.toml:

[package]
name = "foo-derive"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

[dependencies]
proc-macro2 = "1"
quote = "1"
syn = { version = "1", features = ["full"] }

[dev-dependencies]
foo = { path = "../foo" }
trybuild = "1"

Is it inherently wrong to do that ?

Beside an error (appears twice in a row) in the output of Rust Analyzer Language Server in vscode

[ERROR project_model::workspace] cyclic deps: foo_derive(CrateId(76)) -> foo(CrateId(69)), alternative path: foo(CrateId(69)) -> foo_derive(CrateId(76))

Everything seems to work well.

Veykril commented 1 year ago

It is not a wrong thing to do that, it's just r-a struggling with it (so yes, you are hitting this here). If everything works fine you can safely ignore the error.

Veykril commented 1 year ago

So duplicating the crates causes a bunch of issues, especially for traits and their implementations, as we can possibly have types in scope that implement the original trait, but have the duplicated on in scope. And a lot of other issues similar to that.

So now I'm wondering, the reason for why we want to prevent cycles is obviously graph traversal which makes me wonder if wen can lower dev dependencies to some kind of "weak edge" that is skipped when traversing the graph except for name resolution.

Veykril commented 1 year ago

Hmm, so that approach runs into the problem that we'll have a cycle in the defmap creation ...

flodiebold commented 1 year ago

Yeah. :thinking: How exactly do cyclic dev dependencies work with Cargo? I'm not sure why it doesn't run into the same problem?

Veykril commented 1 year ago

Right, that should occur there as well then 🤔. Also would be interesting to hear how the jetbrains plugin deals with this. @Undin @vlad20012 I hope you don't mind the ping, but if you could shed some light on how intellij-rust deals with workspace dev dependency cycles that would be appreciated :) (assuming it deals with them at all)

@matklad maybe you have some more ideas here as well

matklad commented 1 year ago

How exactly do cyclic dev dependencies work with Cargo? I'm not sure why it doesn't run into the same problem?

So the core thing here is that there are packages and crates. We physically can not have cyclic dependencies between creates, cause they need to be compiled in topological order.

I believe cargo has some logic to prevent cyclic dependencies between packages (eg, if you literally add a cyclic dep, cargo will bail out), but that can be circumvented via a dev dependency. Adding a dev-dep on oneslef is forbidden (an explicit check in cargo), but adding a self-dev-dep via a third package works.

I wonder if we can just, like, ban this in cargo in an edition boundary? Like, this 100% works today, and people are relying on that, but semantics are confusing (you basically have two copies of every trait/type in scope), and compilation-time hit is big.

My gut feeling is that we should add an off-by-default flag, which models cargo semantics precisely, by essentially duplicating the crate graph (every crate gets variants with and without —-test). This we can use as a testbed to iron out any issues with the single file belonging to more than one crate. Yes, that’s going to be hugely confusing, because there’s going to be two copies of every crate and what not, but that’s what is actually physically happening. With that flag in pace, we can have a better error message here saying “hey, we noticed you have cyclic dependencies, would you like to fix your code, or to check this checkbox?”

matklad commented 1 year ago

you basically have two copies of every trait/type in scope

Perhaps more importantly, you also literally have two independent copies of the crate in the final binary

vlad20012 commented 1 year ago

Also would be interesting to hear how the jetbrains plugin deals with this

We just carefully remove such dev-dependencies graph edges that lead to a cycle (during the lowering from a cargo project model to a crate graph). Then we suppress errors in cfg(test) items and #[test] functions inside a crate with a removed dev dependency

bragov4ik commented 1 year ago

Related to https://github.com/rust-lang/rust/issues/79381 Just thought it makes sense to link it here :)

Veykril commented 1 year ago

After a discussion with @robjtede regarding actix (a crate that suffers from this issue), I figured we should probably try to replace dev-dependency edges on cycle paths instead of keeping whatever edge / dep-path that was first added. That should reduce the problems to just test code which is the lesser evil here. And if am not mistaken, that is exactly what jetbrains seems to be doing after re-reading their reply on this issue:

Also would be interesting to hear how the jetbrains plugin deals with this

We just carefully remove such dev-dependencies graph edges that lead to a cycle (during the lowering from a cargo project model to a crate graph). Then we suppress errors in cfg(test) items and #[test] functions inside a crate with a removed dev dependency

I would consider this issue effectively fixed (or moved to wontfix status afterwards rather) with that change applied. As was discussed before, the real issue here is that cargo even allows this kind of cycle.

ukitta555 commented 1 year ago

is there an approximate timeline for when this is planned to be fixed? Coding in large codebases that have such a circular dependency becomes unbearable (seems like I really got used to IntelliSense). Just in case you wanted to suggest JetBrains RustRover, I've just switched to VSCode since JetBrains SSH Gateway freezes every five minutes once I start running CPU-intensive tasks :(

Veykril commented 1 year ago

No timeline here, just needs someone to implement the heuristic that I've noted down in my previous comment. I personally don't have the time to fix this in the near future. The relevant code part should be here https://github.com/rust-lang/rust-analyzer/blob/11a87c917943dac5a568579f799c2d7458324103/crates/base-db/src/input.rs#L439-L450

(Also to reiterate from an IDE point of view, this kind of cycle should've never been allowed in the first place by cargo so I can only recommend removing such cycles from your projects if feasible)

Manishearth commented 10 months ago

Highly recommend reading Gankra's docs on Cargo cycles here for anyone attempting to try and understand them.

Manishearth commented 10 months ago

I wonder if we can just, like, ban this in cargo in an edition boundary? Like, this 100% works today, and people are relying on that, but semantics are confusing (you basically have two copies of every trait/type in scope), and compilation-time hit is big.

@matklad If you're talking specifically about the ability for a crate to dev-dep onto itself, yeah that's strange.

However a crate should be able to dev-dep onto something that depends on, that's a very normal and common pattern in the community, and is intentionally designed. It does involve compiling things twice (see the docs above for how that works)

matklad commented 10 months ago

that's a very normal and common pattern in the community, and is intentionally designed

So, I am talking about a situation where a single binary ends up linking two copies of a crate that differ in —-test flag passed to rustc.

I am not sure whether the design is intentional: when I looked at the code several years ago, I think it looked more like this case was missed by the code that tries to forbid cycles in Cargo, but I haven’t dug into history specifically. Would appreciate links to historical context!

I agree that this is pattern is de-facto common, but I think that’s for a bad reason:

In larger workspaces, you inevitably stumble into this pattern eventually, especially if you are a novice. However, to understand what actually happens (that you have to copies of every type) you need to be a Rust expert. This is one imo big reason why this is not good.

The second big reason is compilation time hit. The smaller problem here is that the crate is compiled twice. The more significant problem is that all crates participating in the cycle effectively become one “unit of compilation” —- changing one crate requires recompiling the whole cycle. This significantly reduces iteration time for tests, as you need to compile the crate you are working on, and the the stuff that depends on it.

Now, there are probably cases where having this feature is beneficial (though I actually can’t name any of the top of my head — in particular, I don’t understand why sede derive does what it does, I’d say doctest = false would be a better solution for that doctest, if I am reading the code correctly that there’s a single, trivial docrest that uses this feature). But I’d wager 8 out of 10 times this feature is used, it is used by accident, without understanding the underlying mechanism of crate duplication.

Manishearth commented 10 months ago

The implications are different for doctests/integration tests and unit tests. It is quite intentional (and useful!) for the former and in theory does not have those downsides.

For the latter ... yeah it's less useful there and has confusing behavior. Might be worth getting rid of it, but there's something to be said for consistency.

However, as far as I can tell both these situations trigger this rust-analyzer bug. Edition-killing the latter won't solve the problem.

I'll write up a more detailed response later.

matklad commented 10 months ago

The implications are different for doctests/integration tests

Integration tests depend on non—-test version of a crate, so they don’t actually create a cycle by themselves in a sense of «two slightly different copies of a crate linked in”.

Though, if you have “package cycle” in integration tests, you’d have “crate cycle” in unit test.

As a straw-man proposal, cargo can exclude problematic dev deps from a lib crate for unit tests, but keep them for integrations (as they are not actually problematic there)

matklad commented 10 months ago

Highly recommend reading Gankra's docs on Cargo cycles here for anyone attempting to try and understand them.

Ah, I missed an important detail: that doc talks about “package cycles”. This is actually mostly irrelevant for the discussion, as rust-analyzer intentionally mostly avoids modeling Cargo packages, because not everything is cargo.

What rust-analyzer models is the unit-graph, the graph of compilation units.

It of course doesn’t have “true” cycles, but it has quasi cycles through crate duplication: it might be the case that a same crate is present in the unit graph twice, once with and once without —-test flag.

rust-analyzer could model that faithfully internally, but chooses not to, because presenting this internal modeling to the user in a reasonable way is a hard problem: the same set of source files gives raise to two distinct, and potentially arbitrary different, entities in the semantic model. Moreover, not only this problem is hard, it is self-inflicted: while having two copies of a crate in the unit graph technically works, it isn’t actually a useful thing to have.

Manishearth commented 10 months ago

As a straw-man proposal, cargo can exclude problematic dev deps from a lib crate for unit tests, but keep them for integrations (as they are not actually problematic there)

Yeah that's what I was suggesting: "don't allow package cycles at all" is very much a non starter and comes from a position of conflating the two categories of tests.

Manishearth commented 10 months ago

I think @matklad is up to speed already but I'm going to write this for the benefit of everyone else.

Alright, so, cyclic dev deps. How do they work? Why do we need them? How much bathwater is there amongst these babies?

So firstly it's important to highlight that dev dep cycles are package graph cycles, not build graph cycles. A build graph cycle is impossible, but a package can contain multiple build targets: the crate, examples, integration tests, benches, doctests, and the crate unit test binary (this one is weird).

I'll get to that last one in a bit. In the case of examples, unit tests, benches, and doctests, it's quite simple: each of these is at a build level a new binary crate, which depends on the main crate. It's totally fine for it to depend on things that depend on the main crate.

In fact, it's quite useful: You want to be able to write docs and examples and tests that show integration with downstream crates. This isn't just a niche issue for serde/serde_derive: those are crates maintained in the same repo, shuffling tests around seems less of a burden. But not all such cases are between crates that are maintained as such.

Even in the same repo it's not a great restriction to have that crates should only be allowed to have examples/doctests that can't reference the entire crates.io universe. examples and doctests are a matter of how the crate is presented; which really ought to be up to the crate author. It's not just about testing.

The unit test binary, on the other hand, is rather strange. Unit tests -- #[test] annotations in src (rather than tests/) are allowed to access private functionality of the crate. This is a separate build target that is produced by running rustc --test src/lib.rs.

If you don't have dev deps, this is pretty straightforward, not very different from the different build targets involved when building your crate in debug vs release mode.

If you do have dev deps, however, this does mean you have the potential for cycles. They're still not cycles in the build graph, but they are cycles in the sense that src/lib.rs will get compiled in both --test and regular mode in the same build graph.

This can lead to strange behavior; the same kind of error you get when you have two versions of the same crate in the same build graph and attempt to mix their types. The ability for Cargo to do this is less useful, too: if you really need private internals for a test with such dependencies, it is probably better to do some cfg(internal_testing) shenanigans and expose stuff to an integration test. Integration tests are not a part of the crate's presentation, it's far more okay to ask crate authors to rewrite these.

In general I would be surprised if there was much in the ecosystem relying on this, and especially surprised if there was much of this that was intentional.

However a core issue is that Cargo's view of dependencies is rather simple: dev-deps are global, and used for every testing target. This is a persistent problem: the request for marking dev deps as optional is related.

So unit tests are forced to be built with cycles, even if the unit test doesn't need it, because the integration tests/examples/doctests might.


All in all:

The Babies: Being able to have cycles in dev deps is an important bit of functionality for integration tests, doctests, examples, and benches. It does not have strange behavior there.

The Bathwater: Cycles in dev deps is not that useful for unit tests, which is also where the problem lies. This can be safely thrown out, I think.

The fact that you cannot specify dev deps for only one set of targets is what makes it hard to throw out the bathwater without including some babies in it.


I think a holistic proposal would:

The more narrow proposal is to just make this an edition breakage. But I think it's worth looking at this problem holistically too.

cc @ehuss @epage @muscraft

matklad commented 10 months ago

:100: excellent summary, thanks Manish!

I am also wondering, could we apply some tactical fix on the rust-analyzer's side here? Basically, what if rust-analyzer just goes and does the starwman proposal from https://github.com/rust-lang/rust-analyzer/issues/14167#issuecomment-1864145772 ?

The relevant bit of code looks like this in my local checkout:

    for pkg in cargo.packages() {
        for dep in &cargo[pkg].dependencies {
            let Some(&to) = pkg_to_lib_crate.get(&dep.pkg) else { continue };
            let Some(targets) = pkg_crates.get(&pkg) else { continue };

            let name = CrateName::new(&dep.name).unwrap();
            for &(from, kind) in targets {
                // Build scripts may only depend on build dependencies.
                if (dep.kind == DepKind::Build) != (kind == TargetKind::BuildScript) {
                    continue;
                }

                add_dep(crate_graph, from, name.clone(), to)
            }
        }
    }

It adds dependencies in arbitrary order, and so the first one wins. I think if it is changed so something like

    for pkg in cargo.packages() {
        for dep in &cargo[pkg].dependencies {
            let Some(&to) = pkg_to_lib_crate.get(&dep.pkg) else { continue };
            let Some(targets) = pkg_crates.get(&pkg) else { continue };

            let name = CrateName::new(&dep.name).unwrap();
            for &(from, kind) in targets {
                // Build scripts may only depend on build dependencies.
                if (dep.kind == DepKind::Build) != (kind == TargetKind::BuildScript) {
                    continue;
                }
                if to is a lib target with --test &&
                    from transitively depends on to without --test {
                    continue; // Throw the bathwater out, but keep the babies
                }

                add_dep(crate_graph, from, name.clone(), to)
            }
        }
    }

It maybe feels like that's what https://github.com/rust-lang/rust-analyzer/pull/15754 is doing? From the cursory look (pressed on time here, sorry!), I don't think that that implementation is quiet right though: https://github.com/rust-lang/rust-analyzer/pull/15754#discussion_r1433058039.

Manishearth commented 10 months ago

Oh, that's a great idea. This is broken anyway, we can make it less broken by deliberately keeping it broken for --test and fixing it for everyone else.

No comment on implementation, I'm not familiar with r-a code.

In the long run I would recommend using guppy to query cargo things, it's got a really good API for this kind of thing.

Veykril commented 10 months ago

It maybe feels like that's what https://github.com/rust-lang/rust-analyzer/pull/15754 is doing?

No that PR did not try to do that iirc, it was about a different issue with two projects where one has a dev-dep on some dependency while the other has a normal dependency on it. But I believe I wrote the same thing here as well roughly https://github.com/rust-lang/rust-analyzer/issues/14167#issuecomment-1763462435:

I figured we should probably try to replace dev-dependency edges on cycle paths instead of keeping whatever edge / dep-path that was first added. That should reduce the problems to just test code which is the lesser evil here.

So yes, we should change things so that dev-deps get kicked out if we find a cycle with a non-dev dep. It's also what RustRover is doing I believe

epage commented 10 months ago

I can understand the appeal to be more specific in what dependencies apply to what targets. However, I feel a lot of the value in cargo is its simple interface (Cargo.toml) and I worry we'd lose that.

Manishearth commented 10 months ago

@epage I think we can still retain the simplicity whilst

I'm not proposing we force everyone to specify this stuff

Veykril commented 7 months ago

Fyi, https://github.com/rust-lang/rust-analyzer/pull/16871 followed by https://github.com/rust-lang/rust-analyzer/pull/16886 implement the approach of preferring non-dev depedency edges if possible, minimizing the problems we encounter with this

ian-h-chamberlain commented 6 months ago

Hi, just found this issue as I was trying to figure out why I was seeing an unresolved-extern-crate error in a project. I think this is what it comes down to:

The Bathwater: Cycles in dev deps is not that useful for unit tests, which is also where the problem lies. This can be safely thrown out, I think.

In our case, we actually do rely on this kind of cycle in https://github.com/rust3ds/ctru-rs, but maybe there's a way we could work around it? Here's how our (trimmed for clarity) dependency tree looks:

ctru-sys v0.5.0
└── libc v0.2.154
[dev-dependencies]
├── shim-3ds v0.1.0 (https://github.com/rust3ds/shim-3ds.git#98015084)
│   ├── ctru-sys v0.5.0 (*)
│   └── libc v0.2.154
└── test-runner v0.1.0
    ├── ctru-rs v0.7.1
    │   ├── ctru-sys v0.5.0 (*)
    └── ctru-sys v0.5.0 (*)

There are two cases here:

test-runner, which is a helper crate for use with custom_test_frameworks, is required for our unit tests to work properly — but it in turn relies on the ctru-sys crate (directly and indirectly) to implement its functionality. This works fine as r-a doesn't seem to know or care about the custom test framework.

shim-3ds is the crate which triggers unresolved-extern-crate. We use it to fill in some libc definitions at link time, so the only usage is (still needed for unit tests) just this:

#[cfg(test)]
extern crate shim_3ds;

In both cases, we use [patch] in Cargo.toml to ensure the local versions of ctru-rs/ctru-sys are used in the graph, meaning we end up with a #[cfg(test)] and a #[cfg(not(test))] version of ctru-sys, but it works for our testing!


From what I can tell, this is the kind of use case that is being considered as "not really useful", is that right? Is there maybe some other way we can achieve the same thing that would play nicer with r-a, or something? I guess something like #[cfg(all(test, not(rust_analyzer)))] might work just to get rid of the squiggly, but otherwise I'm not sure if there's a simple alternative to our current setup.

Manishearth commented 6 months ago

From what I can tell, this is the kind of use case that is being considered as "not really useful", is that right?

No, this was out of scope for my analysis, mostly because custom test frameworks are even weirder and already not supported by rust-analyzer. This may be a legitimate use case but it's legitimate because it's via custom test frameworks.

shim-3ds is the crate which triggers unresolved-extern-crate. We use it to fill in some libc definitions at link time, so the only usage is (still needed for unit tests) just this

This seems less good, since you end up with multiple pairs of types. This is probably better served by slicing the deptree a bit more.

ian-h-chamberlain commented 6 months ago

This seems less good, since you end up with multiple pairs of types. This is probably better served by slicing the deptree a bit more.

Hmm, do you mean like if we tried to re-export the types or something, we'd end up with two different versions of the crate's types/functions? In our case that's fine, because we don't actually call/reference any of those transitively via shim-3ds.

As a small workaround without requiring full support from r-a, I wonder if our particular situation could be helped out by allowing dependencies up to the point that a cycle is created, rather than eliminating the dependency branch entirely? Without knowing too much about the implementation, it seems like r-a is disabling this edge (marked X) when it detects the cycle:

ctru-sys v0.5.0
[dev-dependencies]
└─X shim-3ds v0.1.0 (https://github.com/rust3ds/shim-3ds.git#98015084)
    ├── ctru-sys v0.5.0 (*)
    └── libc v0.2.154

Would it be possible instead to ignore this one instead? It would allow for more symbols to be resolved, and in our crate just silence the one error we have for extern crate shim_3ds:

ctru-sys v0.5.0
[dev-dependencies]
└── shim-3ds v0.1.0 (https://github.com/rust3ds/shim-3ds.git#98015084)
    ├─X ctru-sys v0.5.0 (*)
    └── libc v0.2.154

It probably is possible for us to restructure our crates to avoid this cycle, but just wondering if that would be a more general "partial solution" for cases similar to ours where Cargo does allow a cycle like that.

Manishearth commented 5 months ago

It probably is possible for us to restructure our crates to avoid this cycle

I think there are a lot of benefits of doing that anyway.

OTOH I don't see a huge benefit of making this rather niche case work.