rust-secure-code / cargo-supply-chain

Gather author, contributor and publisher data on crates in your dependency graph.
Apache License 2.0
313 stars 18 forks source link

Consider transparently downloading the DB dump instead of fetching live results by default #78

Open Shnatsel opened 2 years ago

Shnatsel commented 2 years ago

TL;DR: run cargo supply-chain update implicitly from other commands, instead of defaulting to querying the API.

If the cache is expired or nonexistent, and --cache-max-age allows it, we could download the latest DB dump by default instead of fetching live results. This would be a lot faster in the typical case.

We would still need to fall back to querying live data from the API if the latest DB dump published by crates.io is older than --cache-max-age.

Shnatsel commented 2 years ago

The part of the dump we need seems to only require a 50Mb download, which is not too bad.

smoelius commented 3 months ago

@Shnatsel Are there any "gotchas" you would anticipate, were someone to try to implement this?

Shnatsel commented 3 months ago

The database dumps are not officially in a stable format, so I could see the format changing in the future and the tool breaking. However, in practice the parts we care about have not changed in years.

The database download also relies on a somewhat fragile order of the files in the archive to reduce the download size. This could easily change without warning and increase the download size considerably.

Finally, these aren't really live results (up to 48 hours out of date by default), but that is probably fine as long as we display a warning about it.

I don't foresee any issues within the code of cargo supply-chain itself.