Expand manifest validation for all settings

behnam commented 7 years ago

There are some practical difficulties arising from the fact that manifest files are not always validated. I want to collect such cases, so we can improve the user experience, specially for newbies, by showing warnings about bad manifest entries, as soon as possible.

One important fact here is that we almost never want to throw an error on invalid entries, because it will affect forward-compatibility: would make many future changes non-backward-compatible.

Some Examples

I think almost every config item can have some sort of validation. This list only presents possible validation methods for different types of values.

package.version: The version is not already present (#3662).
package.keywords: Having invalid keywords strings only surface during packaging/publishing. See https://github.com/rust-lang/cargo/issues/4300 for details.
package.categories: This is a bit harder, because the list is maintained in crates.io repo. But, like any other index-related data, it can be fetched, cached, and checked against.
package.include, package.exclude, workspace.exclude, workspace.members: These are configs with pattern-matching and warning on invalid patterns can help users track down packaging issues easier and faster, and make changes in those areas easier. (See https://github.com/rust-lang/cargo/issues/4268)
package.readme and other file paths: ~~Need to check the existence of the file, if linked~~ (see #13921). In addition, we can also warn on missing the config, if a best-guess file is present (similar to #13073 since we already infer the file if a "well known" name is used).
package.homepage and other URLs: Perform URL validation and check URLs against the newly-implemented blacklist. (blacklist is maintain in crates.io, so this would be another index-dependent check.)
features.<name> #5554
Banning of wildcard dependencies #5941

behnam commented 7 years ago

cc @alexcrichton, @carols10cents

alexcrichton commented 7 years ago

Seems plausible to me! All this validation happens on crates.io though which I'd prefer to not duplicate, although we can perhaps do simple things like validate homepage is a URL locally for sure. (also agreed with warnings-for-now)

behnam commented 7 years ago

Some more background and my rational for this proposal: The other day, for a new release for UNIC, I was updating a dozen of manifests, mostly their metadata. Since it's not vise to package/publish with dirty repository, I had to commit after each publish error, and try publishing again. This resulted in a handful of commits just catching all the small nits from the publish command, like "don't use space in the keywords list".

Agreed that we better have one source of truth. However, I also believe we can benefit a lot by moving this one source of truth to a library that can be used in client side (cargo CLI, clippy, etc), as well as server side (crates.io, etc). I call it the cargo-manifest crate, which I think should spin out of cargo proper (which hosts all the CLIs, etc). This would allow cargo, crates.io and third-party CLIs to share the same manifest read/validation logic, without the need to have cargo execution details imported as dependency.

What do you think?

Boddlnagg commented 7 years ago

https://github.com/rust-lang/crates.io/issues/7250 is also related to this (see point 2.)

Yatekii commented 1 year ago

This is a painpoint that happens often to me.

Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?

I would love to work on this.

weihanglo commented 1 year ago

Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?

The solution should come from a discussion with crates.io team. I don't have the answer to it, but Zulip t-crates.io is a better place to have such a chat with crates.io team I believe.

Rustin170506 commented 1 year ago

The solution should come from a discussion with crates.io team. I don't have the answer to it, but Zulip t-crates.io is a better place to have such a chat with crates.io team I believe.

Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?

In our previous crates.io team meeting, we expressed concerns regarding the potential performance issues that could arise from introducing this validation to the publish API. The API is resource-intensive as it requires unzipping the tarball to retrieve metadata, which could also lead to the consumption of your rate limit.

Thus, we're inclined first to unify the validation rules of Cargo and crates.io. This would be a good initial step, and I am currently working on this.

See: https://github.com/rust-lang/crates.io/pull/7379

Yatekii commented 1 year ago

Even if the rules are unified, a divergence between crates.io and cargo can occur if the cargo version is outdated.

I fail to see how uploading a bit of metadata via some simple API request is a problem. Admittedly I do not know how the API looks like at all.

epage commented 1 year ago

In going through cargo's triage (or talking to people about it), I remember seeing somewhere that some differences were intentional.

An alternative route to go, once we have #12235, is to have the crates-io specific validation rules in a crate and a cargo linter would expose lints for crates-io and "all registries" published packages that don't conform to those validation rules. During the times where crates-io is out of sync with someone's MSRV, they can allow certain lints.

epage commented 1 year ago

I fail to see how uploading a bit of metadata via some simple API request is a problem. Admittedly I do not know how the API looks like at all.

For the publish workflow, crates.io has been shifting away from "trust the metadata" to "parse Cargo.toml". That is what is expensive. Yes, there are a lot of design decisions that can be made, with various trade offs, that might get us to having server-side validation.

For example,

we could have a dry-run today and they just validate the metadata but (1) they might now be checking more than we upload and (2) this can require them to duplicate a lot of logic
We could POST the Cargo.toml but that file is incomplete without the associated project (I do want to auto-fill in most fields that are implied from the file system)
If we post an entire .crate then we run in the server-side performance issues

Yatekii commented 1 year ago

Yeah I see that sending just an extract might be error prone.

Also, upon further reflection, I think just syncing the two checkers or at least having a crates.io-oriented validation crate would be enough. For people that do not want to keep up with the release cadence of cargo it will fail, but for everyone else this is perfect :)

epage commented 10 months ago

One thought on how to solve this

crates.io split their limits into a package for cargo to pull in
Cargo check the limits in a crates.io-specific set of lints (#12235). These could even be deny by default as the user can workaround limit changes (global or per user) by making the lint an allow lint.

One question is when should these lints fire. With publish = true (which is default though we've talked about changing that in #6153), all registries are matched. At least with #12786, we have truly private packages, more easily. So do we emit them on all commands, even if they won't apply? Do we reserve this for package and publish but then people won't find out until its too late? Do we do this in cargo's linter that we've talked about? imo no real solid "this is the way" answer.

In the lint, we should

Mention making the package private as a way to disable
Narrow down the list of registries if they aren't using crates.io (only if other registries are configured)
Disable the lint if it doesn't apply
Reach out to crates.io about limit issues

rust-lang / cargo

Expand manifest validation for all settings #4377

Some Examples