panamax-rs / panamax

Mirror rustup and crates.io repositories, for offline Rust and cargo usage.
Apache License 2.0
443 stars 46 forks source link

Full sync logic is very subtle and possibly buggy #58

Closed aidanhs closed 2 years ago

aidanhs commented 2 years ago

https://github.com/panamax-rs/panamax/blob/0b169b07ab285fe52de47e490152798e5ff67fac/src/crates.rs#L110-L118

There are two scenarios here:

  1. an incremental sync based on last commit of crates.io index
  2. a full sync, invoked when crates.io index hasn't changed

After carefully reading the code, I realised scenario 2 (do_full_scan == true) should only happen on the initial clone, because the rewrite of the config.json adds a new commit every time which makes it always mismatch when compared to refs/remotes/origin/master. Except that, if someone comments out base_url (so no rewrite happens, so no commit is made) and happens to update when there haven't been any changes to the index, I think they will initiate a full sync.

I feel like a good solution to this is to turn the do_full_scan logic into an explicit flag that gets passed in. If a "first sync" is detected then this flag can be set and passed in.

Given that the download function does check if a file already exists on disk, this would also permit passing an argument to sync to 'recheck' the whole downloaded repository.

k3d3 commented 2 years ago

I've made some changes in Panamax 1.0.5 that should hopefully address this. The way it works now is, on initial clone, the "master" branch (or at least, the HEAD reference for it) gets removed, and only once a full scan has been completed, the master HEAD gets set to the origin/master HEAD, at which point incremental downloads are used.

Let me know if this fixes the issue for you.