panamax-rs / panamax

Mirror rustup and crates.io repositories, for offline Rust and cargo usage.
Apache License 2.0
443 stars 46 forks source link

Sync over multiple vendor directories (over time) #100

Open flavienbwk opened 1 year ago

flavienbwk commented 1 year ago

Hello,

As far as I understand how to use this repo, it is possible to sync panamax 1 time for the whole crates.io registry :

# This takes a long time but gathers everything at a specific date
panamax sync /mirror

It is also possible to sync only a vendor directory :

# Fast as a few crates are downloaded
panamax sync /mirror /my/path/to/vendor

I'm currently on the second case, I now want to push another vendor directory to panamax. So I do :

# Fast as a few crates are downloaded
panamax sync /mirror /other/path/to/vendor

I want to add multiple vendor directories because I have projects using different versions for a crate and the full sync is too heavy for my use case.

But panamax returns the following result and does not update anything :

Syncing Rustup repositories...
[1/5] Syncing rustup-init files... █████2/2 [00:00:00 / 00:00:00]
[2/5] Syncing latest stable...     █████ 25/25 [00:00:02 / 00:00:00]
[3/5] Skipping syncing beta.
[4/5] Skipping syncing nightly.
[5/5] Cleaning old files...    █████ 0/0 [00:00:00 / 00:00:00]
Syncing Rustup repositories complete!
Syncing Crates repositories...
[1/3] Fetching crates.io-index... █████ [00:00:00]
[2/3] Syncing crates files...   █████2/2 [00:00:01 / 00:00:00]
[3/3] Syncing config...           
Syncing Crates repositories complete!

Can I do what I intend to do (add another vendor dir) with Panamax ? If not, why ? What's the alternative ?

Thank you,

k3d3 commented 1 year ago

I believe @wcampbell0x2a implemented the vendor feature originally so they might have some insight, however I think this might be due to the diffing that's done on the crates.io-index git repo once a sync occurs.

Basically, it uses the local master branch as a marker to keep track of what's been synchronized. On first sync, it grabs everything, then sets master equal to origin/master (that is, the master branch on Github). Then on the next sync, it only checks the files that have changed between master and origin/master, and updates those.

What I think might be happening is it's a double-filter issue - that is, on the first sync, the git diff filter contains everything, so the vendor filter works perfectly fine. But then on the second sync, the git diff filter only contains a few things, which when combined with the vendor filter contains effectively nothing.


With that being said, one way to work around the issue would be to sync with the first vendor directory, delete crates.io-index in the mirror directory, then run sync again with the second vendor directory. Unfortunately you'd have to do this song and dance every time you wanted to update.

As for a solution, https://github.com/panamax-rs/panamax/blob/master/src/main.rs#L34 this currently takes in one vendor directory, but I don't think it'd be too difficult to make it take multiple vendor directories.

wcampbell0x2a commented 1 year ago

I updated my test program https://github.com/wcampbell0x2a/zerus to support multiple vendors, I'll have to add that to panamax.

I also use a much more sane approach of parsing the cargo metadata json file instead of the vendor directory.

flavienbwk commented 1 year ago

Thank you for this crisp explanation @k3d3. The workaround removing creates.io-index works well. Another problem is that importing a vendor directory can't be currently done offline, as the sync steps try to reach the crates.io domain name.

Panamax command failed! Download error: HTTP download error: error sending request for url (https://static.rust-lang.org/dist/channel-rust-nightly.toml): error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution

I'll check what's possible to do. Would you have any suggestions about that?

@wcampbell0x2a thanks for this news. About "importing vendor directories offline", would the cargo metadata approach not make this totally unfeasible ? (as I guess panamax would manage the downloading of the package instead of using the user-downloaded vendor directory ?)