mozilla / cargo-vet

supply-chain security for Rust
Apache License 2.0
669 stars 46 forks source link

"cargo vet certify" github bot workflow #330

Open Gankra opened 2 years ago

Gankra commented 2 years ago

"cargo vet certify, but all with bots/webapps in github so devs don't have to know/use the cli"

An MVP for a bot-based UX should be based on github, either on top of dependabot or a different bot. The design should be extensible enough to port to other infra like gitlab or mozilla's stuff, but supporting those is not in scope for the first version. Of course we will keep that future in mind and not wildly hardcode github stuff. Use json messages, standard auth protocols, etc. where it makes sense.

Basic workflow sketch:

unclear details:

Gankra commented 2 years ago

329 is an initial prototype of this but without any authentication logic, so it just has the user copy base64 strings of json blobs around to sheppard data between the app and github

Gankra commented 2 years ago

how to test the current impl's UI:

git fetch origin pull/329/head:nika
git checkout nika
cd certify-gui; npm install; npm run dev

then go to this localhost url.

Which should show this:

image

Gankra commented 2 years ago

To generate new errors for the cargo-vet to report/handle there is a mock project inside the tests.

First, install whatever code for cargo-vet you currently have:

> cargo install --path ./

...
   Compiling cargo-vet v0.3.0 (C:\Users\ninte\dev\cargo-vet)
    Finished release [optimized] target(s) in 1m 25s
   Replacing C:\Users\ninte\.cargo\bin\cargo-vet.exe
    Replaced package `cargo-vet v0.1.0 (C:\Users\ninte\dev\cargo-vet)` with `cargo-vet v0.3.0 (C:\Users\ninte\dev\cargo-vet)` (executable `cargo-vet.exe`)

Then go to the test-project and update a package:

> cd tests/test-project/
> cargo update -p clap
    Updating crates.io index
    Updating clap v3.1.8 -> v3.2.21
      Adding clap_lex v0.2.4

Now cargo vet should fail

> cargo vet
Vetting Failed!

4 unvetted dependencies:
  clap:3.2.21 missing ["safe-to-deploy"]
  atty:0.2.14 likely missing ["safe-to-deploy"]
  clap_lex:0.2.4 likely missing ["safe-to-deploy"]
  hermit-abi:0.1.19 likely missing ["safe-to-deploy"]

recommended audits for safe-to-deploy:
    cargo vet diff clap 3.1.8 3.2.21     (used by test-project)  (176 files changed, 13437 insertions(+), 10196 deletions(-))
    cargo vet inspect atty 0.2.14        (used by clap)          (508 lines)
    cargo vet inspect clap_lex 0.2.4     (used by clap)          (860 lines)
    cargo vet inspect hermit-abi 0.1.19  (used by atty)          (938 lines)

estimated audit backlog: 25939 lines

You can get this report in json form with --output-format=json:

cargo vet --output-format=json
{
  "certify_web_gui_data": null,
  "conclusion": "fail (vetting)",
  "failures": [
    {
      "missing_criteria": [
        "safe-to-deploy"
      ],
      "name": "atty",
      "version": "0.2.14"
    },
    ...
  ],
  "suggest": {
    "suggest_by_criteria": {
      "safe-to-deploy": [
        {
          "name": "clap",
          "notable_parents": "test-project",
          "suggested_criteria": [
            "safe-to-deploy"
          ],
          "suggested_diff": {
            "diffstat": {
              "count": 23633,
              "raw": " 176 files changed, 13437 insertions(+), 10196 deletions(-)\n"
            },
            "from": "3.1.8",
            "to": "3.2.21"
          }
        },
        ...

With the changes in nika's PR, you can now pass --certify-web-gui to make it generate the giant base64 blob that her UI takes in the url (I think it's just a compressed version of the rest of the json):

cargo vet --output-format=json --certify-web-gui
{
  "certify_web_gui_data": "Z3ppcDsfiwgAAAAAAAD/lVVNbxtHDP0rgz3VgbSokh5aHwIkAYrmkEvQW9doqB2uNPAsZzsfUjaB/3sfZyVZdg5JTrI15OMj+R71temjyxwdNbf/fG2ERm5um0QDr3NYxyLNqnHj5B3btzNCLk+WJx9mvFKxALDN3aqxnIA2ZRcEIH/vXTJ9pMymJzFbfIRxcp7tygB4ZUisyZyQbIIYMj705M0xxPuUSUFMiMZJJ32QHINHpqGSw7g8Hl3eh5JNKnGKLjnZoYAk/q+w9JxWeOj3htJtJy/MRyarEUA8omH901ImM8QwmsSS8N2B9blIZE9KaqKYkwmDyXs2A4inGWTHVvHeCzh6rzgpDPlIsSZHBoXB7UrUF82zfHA915x3QYT7WjsH1Mmx1OaFszZtWOwUnORUoz+4VBJr+aUssFMoEZ2ZX7jdtRjtPOXQlxjR72xGJ0C+QW7zsPpmk5d1PVnmj+zu6Lw3EjI2gSXY0jNWlaCYUBI+Ud/l2RyKF460dV7/Q3tTjVW4Ti4T4s9TSOj4SftOppJBu5M3SifEZDRWS0Ys08UlYeI4hDii+FDAyIed6xFwcHw874glI3rhDcCPhC/jSp8wHhQ7h7OEstsrqELN+JqSKnCrclKgLe/p4LBPAGPLIKtz7GQLid6nKtySaFe3M4UjmBVvMNkAwbTmTySSnGFp6/kxWtGXuaqwEeHX2L23FxWpLQzlTP09x4W1TgLuqUDgPJK4qahEKxqslN2o5K5I4wHaSVwXkFBLUTF9H1yuOAi78s2R5rqA9xaE/Ly6ahqesiCbTqOCBMUu1qVx6zBxmHta9u7O0uzk0S2Gekg23dSd7ilaFrXxjhwc9NiLGlpB1cEOecBVf2sjnZA9cMxOlfOsydOuzV9YwkF3ncIIzzlomSsmiEM4o8tQWgvX1qOAi0JJL4QiCOUSL7u5Sk24Ll4rVleDuEq1Bn3CRjh9wk1gBIShE/22mukUW6epOqj8kjnuHW7RjtUjWY+CupHtMtzFztviAIYXnAEYWoeJfNioZ1si+U5G6mNIN6vvCOvEG1qGr5YKS/mRGfesUt2GA+Z6Pv3Pbsb5Kjw9Ft/ciFQwWXMOvsofypcv308/RT3cnX5C0vUPUO9pajTDZ/wwNa/aTfu7Wb82r9qX7cuNQgv0Rv7d44/X81N3d91RzvMV3K9A2fz20yhK6l/Pn58h/TzQvipyDdc8gdq0mz9+EOvu4X8s59hquQcAAA==",
  "conclusion": "fail (vetting)",
  "failures": [
    {
      "missing_criteria": [
        "safe-to-deploy"
      ],
      "name": "atty",
      "version": "0.2.14"
    },
    ...
Gankra commented 2 years ago

correction: the current base64 blob is actually a new json format designed for this usecase. in particular the current json is missing some information the GUI wants like a listing of every supported criteria you can audit for and their description strings / X-implies-Y relationships. If we move away from the base64 format I suggest that we either unconditionally emit this information in json output, or make it something you can toggle on with a flag.

Gankra commented 2 years ago

@mystor had you actually worked out what the bot scripting was going to look like, or were other people on the hook for that step?

I'm mentally sketching out what I want a cleaned up version of this API to look like, I see two reasonable choices but which is better depends on how a bot wants to interface with things (will consult with my coworkers on this, but gonna sketch out the options from the cli's perspective to get them out there). (Also names here are kinda strawman.)

  1. Add --output-format=json-full which includes all the "extra" information about the repo like the defined criteria that a gui/bot would want to have available (probably some subset of the current stuff you're base64ing).

  2. Add cargo vet metadata which prints out just that info as its own json object.

I aesthetically like 2 because it feels very orthogonal and generalizable. My expectation is that 1 is a lot nicer for bots because one json object unconditionally dumped sounds a lot easier than 2 objects, with the second needing to be manually requested on error.

As for getting info back in, I'm not a huge fan of the --from-gui approach (take a json blob of commands from stdin). We already have plumbing commands on cargo-vet and ideally any automation on top of cargo-vet would just use that. That said I'm willing to believe that this format is somehow more desirable (or that you will inevitably just write a script that does --from-gui so maybe it's best to just build it in and save everyone some time).

Gankra commented 2 years ago

Actually maybe https://docs.rs/twelf/0.7.0/twelf/ can let us have our cake an eat it too on the --from-gui side of things

Gankra commented 2 years ago

(we already have some amount of hand-rolled twelf-like logic and it sucks shit and I want to destroy my awful child)

https://github.com/mozilla/cargo-vet/blob/ddae3abf56f3ccdc31ef8b2e6a82080b86c87f09/src/main.rs#L350-L388

mystor commented 2 years ago

unclear details:

* do the certifies appear on the existing PR, open new PRs, or push to main immediately?

I don't think we'd really want this workflow to assume that main ever doesn't pass vet, so I expect it'd only run on existing PRs, and either update the PR or open a new PR into that PR's branch (though that's probably a github thing and might have bad ergonomics?) so that vet can be passing in the PR before merging into main.

* are the certify commits expected to be done by the bot account or the auditor's account?

I don't think it matters a ton either way. The auditor should definitely be credited in the who for the audit, but I think it'd be fine if the certify commits themselves came from the bot.

* should there be one commit per audit, or a single "all the audits" commit

I think a single "all the audits" commit would be fine.

  * if the latter, are audits batched up and the user says "all done" in the app, or do they get submitted to github as each one is performed (editing the mega commit)?

I would expect that they're batched up and applied in a single "all done", but I suppose doing something more live could be a reasonable approach? I worry a bit about wasting a ton of CI time with that though, as we'd be re-running CI for each new audit performed, and it wouldn't be passing.

@mystor had you actually worked out what the bot scripting was going to look like, or were other people on the hook for that step?

I hadn't quite gotten to that step in figuring this out. I was originally working on getting a basic framework working with the UI and a mechanism for applying the changes, and was then going to figure out how that would integrate with a bot-based workflow.

I'm mentally sketching out what I want a cleaned up version of this API to look like, I see two reasonable choices but which is better depends on how a bot wants to interface with things (will consult with my coworkers on this, but gonna sketch out the options from the cli's perspective to get them out there). (Also names here are kinda strawman.)

1. Add `--output-format=json-full` which includes all the "extra" information about the repo like the defined criteria that a gui/bot would want to have available (probably some subset of the current stuff you're base64ing).

2. Add `cargo vet metadata` which prints out just that info as its own json object.

I aesthetically like 2 because it feels very orthogonal and generalizable. My expectation is that 1 is a lot nicer for bots because one json object unconditionally dumped sounds a lot easier than 2 objects, with the second needing to be manually requested on error.

One of the reasons I added a new system was to simplify the amount of data being sent into the app as much as possible, as I was planning to communicate it on the client side using the URL. The original idea bholley proposed in #293 was to use a wasm blob of cargo-vet itself within the web client so that the blob could be kept simple and minimal.

As for getting info back in, I'm not a huge fan of the --from-gui approach (take a json blob of commands from stdin). We already have plumbing commands on cargo-vet and ideally any automation on top of cargo-vet would just use that. That said I'm willing to believe that this format is somehow more desirable (or that you will inevitably just write a script that does --from-gui so maybe it's best to just build it in and save everyone some time).

The decision do to it this way with --from-gui rather than have the bot handle the communication with the web UI was largely because of the discussion in #293, though I'm not super attached to it.