Open sagiegurari opened 6 years ago
Sounds like a good idea to me!
cc @joshtriplett, is this the toml crate you were talking about in the cargo meeting today or is that different?
@carols10cents This is different, but also very useful. I'd love to have this available for all sorts of metabuild scripts that want to parse Cargo.toml
.
bump... just wondering if this is on any todo list...
We've now closed #12801 by providing cargo_util_schemas::manifest::TomlManifest.
This does not do any high level processing, like inheriting from a workspace, reading defaults, or complex validation. This Issue is remaining open for that work. The challenge is a lot of that logic is more tightly coupled to the internals of cargo (see #12801 for how much work it was just to do serde types). Personally, I don't see this Issue being resolved but I'm open to being surprised.
We have cargo-util-schemas
for low-level type definitions in Cargo, and cargo_util_schemas::manifest::TomlManifest
is the core one for serialization and deserialzation for Cargo.toml
. However, since it is only a type definition, there is no high-level Cargo business logic in the library, which people still need to copy from or reimplement Cargo.
There are multi-layer needs people like to see in a cargo-util-manifest
package. Here is a list of these layer:
.cargo/config.toml
, speaking for [profile]
, [patch]
, and maybe package.{include,exclude}
)cargo metadata
and --unit-graph
unstable feature, to some extent)For the latter two needs, they are too coupled to the entire Cargo internals, not something easy to expose. For the first two, it seems like a fairly limited scope "normalized" version of Cargo.toml
. So probably we can provide a library meetings what cargo-toml
/cargo-manifest
needs? I do have concerns around it.
Here are necessary works for breaking out a crate to provide the "normalized" manifest.
Layering contexts: Inside Cargo the project, a type Config
is used pervasively, which has two meaning, representing the value from Cargo configuration system, and storing the global state of the invocation of cargo
. Pulling this in a function means pulling the entire Cargo in scope. We are looking for a way to separate the concern of this, and layering with different contexts is one of the possible approaches.
For example, to_real_manifest
requres a Config
object. But what it really needs is the ability to figure out where workspace roots and CARGO_HOME
is, plus the ability to print warnings. It doesn't need to know other parts like Config::jobserver
.
One solution is abstracting away Config
by abusing using trait
. So a WorkspaceContext
would clearly express that this method could probe the filesystem for looking for workspaces.
The downside is that people need to bring the relevant context into scope when doing things not provided by the default global state object. That adds indirections to the codebase, and is hard to figure out how many methods should be put in a single trait. This also slows down the compilation, because only impl Trait
can be composed but not dyn Trait
.
Decoupling from cargo::core
types: A bunch of types are shared between manifest logic and the other parts of Cargo, for the sake of code simplicity and reusability. For example Summary
is a type representing a subset of Cargo.toml
. It is used by the dependency resolver, registry index handling, and manifest deserialzation itself. To decouple them, you'll bring more glue code or new types to encapsulate these types.
Another simpler example is CompileKind
. To be fair, manifest itself only needs the type definition, but when building a package, Cargo need to consult Config
objectto get the correct build.target
to determine which target platform to work on. This can be resolved by simply move it to a free function, but that would put thing away from where it is defined.
The downside is obvious. More glue code means more maintenance. The cognitive overhead may also increase.
A function returning a normalized cargo_util_schemas::manifest::TomlManifest
: This seems pretty much aligned to the goal of normalization of Cargo.toml
. However, I am not sure if there is any usefulness for Cargo itself. Cargo is already going to create a resolved_toml
during the construction of Manifest
, so presumably there will be not too much performance loss when we refactor this part into a TomlManifest -> TomlManifest
function. In doing so, we may avoid some type leaking from cargo::core::*
and instead stick to types from cargo-util-schemas`. I might be wrong, though.
I don't really have a concrete plan. Here is just a rough ideas in my mind.
Config
from cargo::core::workspace
, and keep workspace probing logic in a new Context
.to_real_manifest
to dedicated functions (except the complex process_dependencies
).process_dependencies
into a InheritableDependency -> InheritableDependency::Value
function.to_real_manifest
to three parts. One is normalization. Another is validation. The last part is "to Manifest
".For most people, they just need a normalized form and cargo metadata
works great for that, especially for dealing with old/new cargo (see "edition as number" topic).
In less common cases, they need the low level details and cargo-util-schema
works well for that.
So what use cases are left that we are aiming for? This is important for us to know to evaluate how worthwhile this is in the first place, how much to compromise cargo development for this, etc.
I know crates.io is in a weird middle ground. They need a normalized form but aren't in an environment to run cargo metadata
. Granted, cargo publish
normalizes Cargo.toml
files sufficiently that they likely can still get away with cargo-util-schema
.
The manifest.rs has the structs and enums that define most of the Cargo.toml That could be really helpful for other build/dev tools like cargo-make For now, I also have internally mapped the relevant items in an internal data structs and enums, but would be happier if I would have just had a standard crate to use to pull those definitions from (naturally with serde support). The manifest.rs seems like exactly the thing I was looking for but I would prefer a standalone lightweight crate only holding definitions instead of depending on entire cargo crate. It would also help making your codebase more modular and reusable.