mozilla / cargo-vet

supply-chain security for Rust
Apache License 2.0
649 stars 43 forks source link

Built-in support for multiple Rust workspaces in same repo #487

Open repi opened 1 year ago

repi commented 1 year ago

Some of our more complex repositories have multiple Rust workspaces to cover different isolated aspects of the project and we would like to audit the crates in all of the workspaces.

We can do this by doing cargo vet init for each sub-directory & workspace but this is tedious and error prone to manage and update and as they are all in the same repository we would like to have a single record and ./supplychain/ folder for the audits of all of the crates in the workspaces.

In the longer term I'm hoping for native support in Cargo of nested workspaces (https://github.com/rust-lang/cargo/issues/5042), but don't think there is much active development towards that.

Could we consider in ./supplychain/config.toml to have list of paths to additional Cargo.toml workspaces to include in the processing? Or any other way to configure and include multiple workspaces in a single auditing database to manage?

bholley commented 1 year ago

@mystor WDYT?

mystor commented 1 year ago

This wouldn't be trivial to implement, but it wouldn't be too bad. We currently always invoke cargo metadata once, within the specified workspace. In order to get something like that to work, we'd need to also invoke it in the other specified workspaces, and update logic inspecting the metadata to support multiple metadata outputs.

In terms of where we'd want to add the config, perhaps it would make more sense to have this information in Cargo.toml rather than in the config.toml, as we generally haven't put paths in config.toml yet (though we could). It'd be similar to how we allow customizing the supply-chain path: https://mozilla.github.io/cargo-vet/config.html#location. Perhaps something like:

[package.metadata.vet]
additional-manifests = ["path/to/Cargo.toml", "path/to/other/Cargo.toml"]

We'd probably end up doing the check by looking for the configuration in the metacfg after parsing it. After that, we'd need to run an additional metadata command for each manifest path provided.

We can't just naively merge the metadata together, as e.g. PackageIds would no longer be unique (as a package could exist in in multiple namespaces with different dependencies/features/etc. for example). Fortunately other than DepGraph building (which would need to keep track of which manifest each package comes from for dependency tracking) it seems like most code using the metadata just wants to iterate over every package.

I think a new type like MetadataSet could be used instead of Metadata in the functions which currently accept it, as a wrapper around a Vec<Metadata>. We'd then expose a .packages() iterator which chains together all of the package arrays for the existing callers which need it, and would manually fix up DepGraph building to support multiple metadata instances.

repi commented 1 year ago

nice, sounds like a plan but yes not super trivial.

having it as metadata in the workspace Cargo.toml could work well, and feels closer to where the future built-in Cargo support for nested workspaces would be