Open pdh11 opened 1 month ago
Hi @pdh11 thanks for the bug report!
Without the --no-traits --no-types --no-fns it's clear what happened: the trait in
mod bar
has a function which uses typebar::Fnord
-- and that function is depicted as "owned by" foo just because it's implemented on a struct which is owned by foo
You are absolutely right in that this is unexpected and feels wrong. This is caused by cargo-modules
being somewhat impl
-blind. Which is to say that while it does scan impl
s it does not represent them in the generated graph. As such like you already outlined methods end up getting incorrectly attributed to whatever module the self
-type is declared in, instead of the actual module where the method's containing impl
is found.
This is definitely a bug!
That said adding impl
nodes to the graph isn't quite as simple as it might seem and for somewhat surprising reasons: testing.
The project heavily relies on snapshot testing as rust-analyzer
's APIs are basically a black box and really hard to write any other kinds of tests against, while still keeping things maintainable.
Both, the snapshot tests, as well as GraphViz require nodes to have unique ids assigned to them. While effectively every object from rust-analyzer
provides an identifier they tend to change between executions, which goes against out snapshotting needs. As such we currently use the node's fully qualified path Rust as its identifier, which already comes with a couple of weaknesses (e.g. not being able to distinguish between crate versions). But impl
s (and several other potentially interesting objects from rust-analyzer
) don't even have an unambiguous path. There can even be multiple impl Foo {}
within the same module, making it rather tricky to distinguish between them (especially when trying to avoid frequently changing snapshot files, such as if one was to include the entity's actual source code range in its id). As such I've so far avoided dealing with impl
nodes.
For this to work properly —afaict— one would have to first add impl
nodes (and find a stable way to identify impls in snapshot tests) and would then probably have to either generate a layered graph (where each layer would describe different semantics) and allow for selecting specific layers for output, or introduce more edge types, such as implements
. Either way one would have to then introduce further or extend existing filtering capabilities to avoid not being able to see the ~forest~ graph for all the newly added ~trees~ edges. It's thorny business.
Sheer spitballing here -- I confess I haven't looked at how either cargo-modules or rust-analyzer actually work -- but wouldn't, say, the identifier full-path-of-trait +"-for-"
+ full-path-of-struct be unique and stable? AIUI it's not legit to implement the same trait for the same struct more than once in the same program -- there could be multiple ones gated by different cfg feature
s, but cargo-modules already works on just one feature-set at a time.
Trait impls aren't the real culprit here, impl Foo {}
is, of which there can be arbitrarily many. You basically have to use the impl
's source code location into account when disambiguating. Which now means you need to deal with file-paths (and if the impl
was generated by a macro, then there actually isn't one, other than say the macro's call site, which in turn could have produced multiple impl
s, re-introducing ambiguity). And even if you're lucky and there is a code source with a corresponding file-path, then you need to obtain the relative path, as otherwise the test suite would generate different snapshots for each person running the tests. And in case of impls
from extern crates the file will most-likely actually be in the global ~/.cargo/…
cache.
I'm not saying it's impossible, but getting stable identifiers for nodes and edges has been one of the thorniest bits of cargo-modules
. And that's without impl
.
All the HIR types provided by rust-analyzer
(and wrapped by Node
) are Eq + Hash
, so one could simply create ids for them by adding them to a id_map: HashMap<Node, NodeId>
and deriving the corresponding NodeId
from the current NodeId(id_map.len())
. But that would mean that every time you run cargo modules dependencies …
you get a completely different textual output (while it would —fingers crossed— remain visually stable), since each value of NodeId
would be non-deterministic. This would make it impossible to use end-to-end snapshot testing. But without a thorough and extensive test suite a project like cargo-modules
becomes unmaintainable very quickly. This became apparent very quickly early on.
Since the main problem is with the identifiers one simply use the id-map approach outlined above for the .dot
output and could try adding a (probably undocumented) --test-snapshot
CLI flag for testing, which instead of producing .dot
(which requires unique IDs) returns a snapshot representation of certain graph properties (e.g. list of node labels, list of node in-degrees, list of node out-degrees, list of edge labels) without identifiers. This should result in stable snapshots, at the cost of losing some information in the snapshots, which might be acceptable though. There will be certain transformations however that simply won't be able to get caught with this compromise though.
The one main drawback from a user's perspective with this workaround is that diffing output of cargo modules dependencies …
becomes a lost cause due to the non-determinism of ids and thus node/edge ordering.
In this example:
it seems pretty clear that the reasonable assessment is that module "bar" depends on module "foo", but not vice versa. Yet
cargo modules dependencies --no-traits --no-types --no-fns --layout circo
shows a cyclic "uses"-dependency:Without the --no-traits --no-types --no-fns it's clear what happened: the trait in
mod bar
has a function which uses typebar::Fnord
-- and that function is depicted as "owned by" foo just because it's implemented on a struct which is owned by foo:This is unhelpful, IMO: the example (distilled from a much larger one, of course) has in fact been written with all due principles of levelisability and non-circularity in mind, but cargo-modules (v0.16.6) produces a "false positive" for circularity.