Expose manifest.rs as a standalone crate

sagiegurari commented 6 years ago

The manifest.rs has the structs and enums that define most of the Cargo.toml That could be really helpful for other build/dev tools like cargo-make For now, I also have internally mapped the relevant items in an internal data structs and enums, but would be happier if I would have just had a standard crate to use to pull those definitions from (naturally with serde support). The manifest.rs seems like exactly the thing I was looking for but I would prefer a standalone lightweight crate only holding definitions instead of depending on entire cargo crate. It would also help making your codebase more modular and reusable.

alexcrichton commented 6 years ago

Sounds like a good idea to me!

carols10cents commented 6 years ago

cc @joshtriplett, is this the toml crate you were talking about in the cargo meeting today or is that different?

joshtriplett commented 6 years ago

@carols10cents This is different, but also very useful. I'd love to have this available for all sorts of metabuild scripts that want to parse Cargo.toml.

sagiegurari commented 4 years ago

bump... just wondering if this is on any todo list...

epage commented 9 months ago

We've now closed #12801 by providing cargo_util_schemas::manifest::TomlManifest.

This does not do any high level processing, like inheriting from a workspace, reading defaults, or complex validation. This Issue is remaining open for that work. The challenge is a lot of that logic is more tightly coupled to the internals of cargo (see #12801 for how much work it was just to do serde types). Personally, I don't see this Issue being resolved but I'm open to being surprised.

weihanglo commented 7 months ago

Problem

We have cargo-util-schemas for low-level type definitions in Cargo, and cargo_util_schemas::manifest::TomlManifest is the core one for serialization and deserialzation for Cargo.toml. However, since it is only a type definition, there is no high-level Cargo business logic in the library, which people still need to copy from or reimplement Cargo.

There are multi-layer needs people like to see in a cargo-util-manifest package. Here is a list of these layer:

Populates the default value for each field (requiring workspace-awareness)
Populates the values of fields that depend on filesystem probing.
Populates the values of fields in a given context (on a specific path with some .cargo/config.toml, speaking for [profile], [patch], and maybe package.{include,exclude})
Populates the value of fields that needs coordinate with remote sources and dependency resolution (this is currently provided by cargo metadata and --unit-graph unstable feature, to some extent)

For the latter two needs, they are too coupled to the entire Cargo internals, not something easy to expose. For the first two, it seems like a fairly limited scope "normalized" version of Cargo.toml. So probably we can provide a library meetings what cargo-toml/cargo-manifest needs? I do have concerns around it.

Concerns

Here are necessary works for breaking out a crate to provide the "normalized" manifest.

Layering contexts: Inside Cargo the project, a type Config is used pervasively, which has two meaning, representing the value from Cargo configuration system, and storing the global state of the invocation of cargo. Pulling this in a function means pulling the entire Cargo in scope. We are looking for a way to separate the concern of this, and layering with different contexts is one of the possible approaches.

For example, to_real_manifest requres a Config object. But what it really needs is the ability to figure out where workspace roots and CARGO_HOME is, plus the ability to print warnings. It doesn't need to know other parts like Config::jobserver.

One solution is abstracting away Config by ~~abusing~~ using trait. So a WorkspaceContext would clearly express that this method could probe the filesystem for looking for workspaces.

The downside is that people need to bring the relevant context into scope when doing things not provided by the default global state object. That adds indirections to the codebase, and is hard to figure out how many methods should be put in a single trait. This also slows down the compilation, because only impl Trait can be composed but not dyn Trait.
Decoupling from cargo::core types: A bunch of types are shared between manifest logic and the other parts of Cargo, for the sake of code simplicity and reusability. For example Summary is a type representing a subset of Cargo.toml. It is used by the dependency resolver, registry index handling, and manifest deserialzation itself. To decouple them, you'll bring more glue code or new types to encapsulate these types.

Another simpler example is CompileKind. To be fair, manifest itself only needs the type definition, but when building a package, Cargo need to consult Config objectto get the correct build.target to determine which target platform to work on. This can be resolved by simply move it to a free function, but that would put thing away from where it is defined.

The downside is obvious. More glue code means more maintenance. The cognitive overhead may also increase.
A function returning a normalized cargo_util_schemas::manifest::TomlManifest: This seems pretty much aligned to the goal of normalization of Cargo.toml. However, I am not sure if there is any usefulness for Cargo itself. Cargo is already going to create a resolved_toml during the construction of Manifest, so presumably there will be not too much performance loss when we refactor this part into a TomlManifest -> TomlManifest function. In doing so, we may avoid some type leaking from cargo::core::* and instead stick to types from cargo-util-schemas`. I might be wrong, though.

Plan

I don't really have a concrete plan. Here is just a rough ideas in my mind.

[ ] Factor out Config from cargo::core::workspace, and keep workspace probing logic in a new Context.
[ ] Extract validations out from to_real_manifest to dedicated functions (except the complex process_dependencies).
[ ] Refactor process_dependencies into a InheritableDependency -> InheritableDependency::Value function.
[ ] Split to_real_manifest to three parts. One is normalization. Another is validation. The last part is "to Manifest".
[ ] Determine where we should expose this logic to users.

epage commented 7 months ago

For most people, they just need a normalized form and cargo metadata works great for that, especially for dealing with old/new cargo (see "edition as number" topic).

In less common cases, they need the low level details and cargo-util-schema works well for that.

So what use cases are left that we are aiming for? This is important for us to know to evaluate how worthwhile this is in the first place, how much to compromise cargo development for this, etc.

I know crates.io is in a weird middle ground. They need a normalized form but aren't in an environment to run cargo metadata. Granted, cargo publish normalizes Cargo.toml files sufficiently that they likely can still get away with cargo-util-schema.

rust-lang / cargo