`cargo vet certify` unnecessarily reformats `config.toml`

anforowicz commented 9 months ago

In my project the supply-chain/config.toml file is auto-generated from project-specific metadata by a tool (*). The file has a copyright header and some other headers like # @generated by tools/crates/gnrt vendor. Do not edit.. These headers are clobbered by some cargo vet commands.

I understand that cargo vet fmt will reformat all the files, including config.toml. OTOH, IMO cargo vet certify ... should only write to audits.toml and should leave config.toml unchanged. Is that a reasonable expectation?

(*) We use the gnrt tool to mapping project-specific categorization of crates into cargo vet requirements like:

[policy."autocfg:1.1.0"]                                               
criteria = ["does-not-implement-crypto", "safe-to-deploy", "ub-risk-2"]

bholley commented 9 months ago

This seems reasonable at first glance. Certain certifications can necessitate changes in config.toml, but I think activating them entails prune? @mystor WDYT?

mystor commented 9 months ago

cargo vet certify can cause changes to config.toml. For example, a new certify call will remove now-unnecessary exemptions for the crate which was just certified. This happens automatically for the crate which was certified in order to keep the tree relatively clean even without a manual prune call.

In general, all cargo vet commands which load the store also reformat it, even if there are no other changes to the store. When running with --locked, the cargo vet subcommand will actually fail if the formatting of the store doesn't match the formatting which would be generated.

anforowicz commented 9 months ago

Would it be possible to somehow split config.toml into 1) cargo vet-maintained/reformattable part and 2) human-authored part (in our case - with crate-specific, project-specific, separately-generated criteria = ... entries - an example can be found in the original issue report/comment above)? (Just trying to brainstorm a way forward here.)

anforowicz commented 9 months ago

Hmmm... I guess one other alternative may be to modify our tooling (i.e. gnrt) so that it won't generate config.toml from scratch (using handlebars + a .hbs template) but instead will read/parse the existing config.toml, make its modifications, and then save config.toml again.

anforowicz commented 9 months ago

/cc @danakj

anforowicz commented 8 months ago

When running with --locked, the cargo vet subcommand will actually fail if the formatting of the store doesn't match the formatting which would be generated.

This seems rather undesirable. I see no harm in having a "@generated by tools/crates/gnrt vendor. Do not edit. " comment at the top. As-is, I have to take out this comment (and a few other comments) to enable running cargo vet in --locked mode (for presubmit/CI which probably should be somewhat hermetic and therefore needs --locked).

cargo vet certify can cause changes to config.toml.

I agree that rewriting config.toml in such cases is reasonable (dropping comments, sorting, and changing formatting in other ways). OTOH I would claim that config.toml shouldn't be modified unless its actual contents change (e.g. most check invocations won't have to change config.toml to remove exemptions or do other changes).

anforowicz commented 8 months ago

In terms of proposed code changes, I am thinking about something along the following lines:

Remove the if check_file_formatting { ... } block. I understand that this block was introduced in https://github.com/mozilla/cargo-vet/pull/348 for a good reason, but I think an alternative fix would be to avoid writing the audits.toml / config.toml / imports.lock files unless their actual content changes (see the next item)
Tweak fn commit to avoid writing the files unless their contents have actually changed (e.g. if deserializing the old file contents results in the same objects (Eq-wise) then keep the old file (with its comments, whitespace, and other potential minor formatting differences)

Please comment if you have any concerns with the proposal above. In the meantime, let me open a pull request along those lines so that we have more concrete code changes to discuss.

mystor commented 8 months ago

I don't think that this is the approach that we want to take.

The automatic formatting, and especially the validation when running under --locked from #348, provides additional valuable benefits for the maintenance of a repository's supply-chain, especially around manual store edits. While the intention is generally that new audits are created using the cargo vet certify tool, it is not uncommon for folks to manually modify the relevant .toml files to add new audits or exemptions. When doing this, it is easy to make some mistakes which the formatting validation code would help fix:

Ordering of members, fields, etc. - When formatting we always generate fields and members in a specific order, to keep things consistent. While it may seem inconsequential to allow adding new code which uses a different format or member order, this then leads to more churn down the line, as CI will not prevent landing the incorrectly formatted change, and the next developer using the cargo vet tool to modify the store will end up causing unrelated formatting changes to the document. This churn is undesirable within commit history, and keeping things formatted before landing is a way to keep it down.
Unknown attributes - when parsing TOML using serde, unknown fields in things like structs are discarded. Without the formatting validation, this would mean that it may be possible to land an audit with e.g. a ntoes = "..." member, which would later be discarded when a command is used to update the contents of the file. Again this leads to unnecessary churn, and could also lead to deceptive audit entries due to missing or unrecognized fields.

Overall, I would prefer to keep the existing behaviour of maintaining strict formatting requirements on files in the store to keep down this kind of churn, and keep supply chain file deltas easier to review.

To my understanding, the main reason for this request is due to wanting to preserve the comment at the start of the config.toml file. I think adding special handling to allow a leading sequence of comments and whitespace to be preserved & replace the default # cargo vet config file. line is something we could do, even across modifications to the rest of the file (e.g. due to formatting).

Would that be sufficient for your use-case? This way we'd still be validating and ensuring that the TOML is correctly formatted and won't generate unnecessary deltas in the future due to manual edits when running with --locked, while also allowing for some comments and other documentation to be included at the top of the file.

I noticed that in your linked chromium revision, there were also some comments inline in the TOML. Preserving something like that would unfortunately be much more difficult (which is one of the reasons why we have an explicit notes = field for audits). However, we could easily add an optional notes: Option<String> member to the RemoteImport struct which would allow notes to be added and preserved around where you added comments.

anforowicz commented 8 months ago

To my understanding, the main reason for this request is due to wanting to preserve the comment at the start of the config.toml file. I think adding special handling to allow a leading sequence of comments and whitespace to be preserved & replace the default # cargo vet config file. line is something we could do, even across modifications to the rest of the file (e.g. due to formatting).

Hmmm... I guess this is indeed correct. Of course this assumes that we are able to use a handlebars-based template expansion to generate a config.toml file that is compatible with cargo-vet's formatting expectations (modulo the initial, top comment and/or whitespace). This assumption is definitely true today and it seems okay to assume that things will stay this way in the future (or that we will be able to modify how we generate config.toml to stay compliant with cargo-vet's expectations if anything changes - e.g. if there are other sorting or formatting requirements).

I noticed that in your linked chromium revision, there were also some comments inline in the TOML. Preserving something like that would unfortunately be much more difficult.

Ack. I think it's okay to replace these comments with handlebars-only comments inside our vet_config.toml.hbs.

mystor commented 8 months ago

Hmmm... I guess this is indeed correct. Of course this assumes that we are able to use a handlebars-based template expansion to generate a config.toml file that is compatible with cargo-vet's formatting expectations (modulo the initial, top comment and/or whitespace). This assumption is definitely true today and it seems okay to assume that things will stay this way in the future (or that we will be able to modify how we generate config.toml to stay compliant with cargo-vet's expectations if anything changes - e.g. if there are other sorting or formatting requirements).

Another option here might be to run cargo vet fmt to reformat the file immediately after generating it. This should just re-format the file to match how cargo-vet would, without doing any other changes. Not sure if that would work well with your build system though.

mozilla / cargo-vet

`cargo vet certify` unnecessarily reformats `config.toml` #589