johnnychen94 / StorageMirrorServer.jl

As I want it be available, fast, complete and persistent
MIT License
7 stars 0 forks source link

cache Artifacts.toml #5

Closed skyzh closed 4 years ago

skyzh commented 4 years ago

SJTUG server takes a long time extracting tar files. In fact, we can retain Artifacts.toml upon downloading tarball. For example, when we download TestImage package, we extract Artifacts.toml into .cache/$(git-sha1)-Artifacts.toml. Next time when doing full update, we can just check Artifacts.toml inside cache, instead of scanning and extracting the tarball. This would significantly boost full update time. (We can even refer to this kind of "full-update" as incremental update.)

johnnychen94 commented 4 years ago

This sounds a good strategy to me.

To reach this, I need to first refactor the code to decouple the downloading of artifacts and downloading of packages; currently, the script download one package and then immediately extract it and download artifacts in one loop iteration, which makes the logic quite complicated and hard to debug. A better strategy is to download all /package/$uuid/$hash, extract all Artifacts.toml, and then download /artifact/$hash. As you suggested here, the "extract all Artifacts.toml" step can be cached and done incrementally.

My best guess is that this decoupling work can help fix the missing artifacts issue in the SJTUG mirror

Let's see if I can make it this weekend.

skyzh commented 4 years ago

This can be done without fetching all packages before downloading artifacts.

I think this can be done with little refactor. I’ll also look into this.

johnnychen94 commented 4 years ago

This is now available in v0.1.0-rc5