Open behnam opened 7 years ago
cc @alexcrichton, @carols10cents
Seems plausible to me! All this validation happens on crates.io though which I'd prefer to not duplicate, although we can perhaps do simple things like validate homepage
is a URL locally for sure. (also agreed with warnings-for-now)
Some more background and my rational for this proposal: The other day, for a new release for UNIC, I was updating a dozen of manifests, mostly their metadata. Since it's not vise to package/publish with dirty repository, I had to commit after each publish error, and try publishing again. This resulted in a handful of commits just catching all the small nits from the publish command, like "don't use space in the keywords list".
Agreed that we better have one source of truth. However, I also believe we can benefit a lot by moving this one source of truth to a library that can be used in client side (cargo CLI, clippy, etc), as well as server side (crates.io, etc). I call it the cargo-manifest
crate, which I think should spin out of cargo
proper (which hosts all the CLIs, etc). This would allow cargo, crates.io and third-party CLIs to share the same manifest read/validation logic, without the need to have cargo execution details imported as dependency.
What do you think?
https://github.com/rust-lang/crates.io/issues/7250 is also related to this (see point 2.)
This is a painpoint that happens often to me.
Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?
I would love to work on this.
Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?
The solution should come from a discussion with crates.io team. I don't have the answer to it, but Zulip t-crates.io
is a better place to have such a chat with crates.io team I believe.
The solution should come from a discussion with crates.io team. I don't have the answer to it, but Zulip
t-crates.io
is a better place to have such a chat with crates.io team I believe.Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?
In our previous crates.io team meeting, we expressed concerns regarding the potential performance issues that could arise from introducing this validation to the publish
API. The API is resource-intensive as it requires unzipping the tarball to retrieve metadata, which could also lead to the consumption of your rate limit.
Thus, we're inclined first to unify the validation rules of Cargo and crates.io. This would be a good initial step, and I am currently working on this.
Even if the rules are unified, a divergence between crates.io and cargo can occur if the cargo version is outdated.
I fail to see how uploading a bit of metadata via some simple API request is a problem. Admittedly I do not know how the API looks like at all.
In going through cargo's triage (or talking to people about it), I remember seeing somewhere that some differences were intentional.
An alternative route to go, once we have #12235, is to have the crates-io specific validation rules in a crate and a cargo linter would expose lints for crates-io and "all registries" published packages that don't conform to those validation rules. During the times where crates-io is out of sync with someone's MSRV, they can allow certain lints.
I fail to see how uploading a bit of metadata via some simple API request is a problem. Admittedly I do not know how the API looks like at all.
For the publish workflow, crates.io has been shifting away from "trust the metadata" to "parse Cargo.toml
". That is what is expensive. Yes, there are a lot of design decisions that can be made, with various trade offs, that might get us to having server-side validation.
For example,
Cargo.toml
but that file is incomplete without the associated project (I do want to auto-fill in most fields that are implied from the file system).crate
then we run in the server-side performance issuesYeah I see that sending just an extract might be error prone.
Also, upon further reflection, I think just syncing the two checkers or at least having a crates.io-oriented validation crate would be enough. For people that do not want to keep up with the release cadence of cargo it will fail, but for everyone else this is perfect :)
One thought on how to solve this
One question is when should these lints fire. With publish = true
(which is default though we've talked about changing that in #6153), all registries are matched. At least with #12786, we have truly private packages, more easily. So do we emit them on all commands, even if they won't apply? Do we reserve this for package
and publish
but then people won't find out until its too late? Do we do this in cargo's linter that we've talked about? imo no real solid "this is the way" answer.
In the lint, we should
There are some practical difficulties arising from the fact that manifest files are not always validated. I want to collect such cases, so we can improve the user experience, specially for newbies, by showing warnings about bad manifest entries, as soon as possible.
One important fact here is that we almost never want to throw an error on invalid entries, because it will affect forward-compatibility: would make many future changes non-backward-compatible.
Some Examples
I think almost every config item can have some sort of validation. This list only presents possible validation methods for different types of values.
package.version
: The version is not already present (#3662).package.keywords
: Having invalid keywords strings only surface during packaging/publishing. See https://github.com/rust-lang/cargo/issues/4300 for details.package.categories
: This is a bit harder, because the list is maintained incrates.io
repo. But, like any other index-related data, it can be fetched, cached, and checked against.package.include
,package.exclude
,workspace.exclude
,workspace.members
: These are configs with pattern-matching and warning on invalid patterns can help users track down packaging issues easier and faster, and make changes in those areas easier. (See https://github.com/rust-lang/cargo/issues/4268)package.readme
and other file paths:Need to check the existence of the file, if linked(see #13921). In addition, we can also warn on missing the config, if a best-guess file is present (similar to #13073 since we already infer the file if a "well known" name is used).package.homepage
and other URLs: Perform URL validation and check URLs against the newly-implemented blacklist. (blacklist is maintain incrates.io
, so this would be another index-dependent check.)features.<name>
#5554Banning of wildcard dependencies #5941