ipfs-inactive / package-managers

[ARCHIVED] 📦 IPFS Package Managers Task Force
MIT License
99 stars 11 forks source link

Experiment: Setting up a Clojars mirror on IPFS #19

Closed andrew closed 5 years ago

andrew commented 5 years ago

Similar to #18, clojars is a maven repository with some extra metadata files and has an rsync server.

Experiment ran on the same hardware as #18

Mirror command:

$ rsync --recursive --times --links --safe-links --hard-links --stats clojars.org::clojars /data/clojars

/data/clojars comes in at ~60GB, across 10,130 top level folders (the apt repo was only 5 top level files/folders)

rsync output:

Number of files: 1,793,912 (reg: 1,634,439, dir: 159,473)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 0
Total file size: 57,437,200,856 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 5,328,593
File list generation time: 0.169 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 183,391
Total bytes received: 41,465,277

sent 183,391 bytes  received 41,465,277 bytes  115,530.29 bytes/sec
total size is 57,437,200,856  speedup is 1,379.09

ipfs config:

$ export IPFS_PATH=/data/.ipfs
$ export IPFS_FD_MAX=4096

$ ipfs init --profile=badgerds

$ ipfs config Reprovider.Interval "0"
$ ipfs config --json Datastore.NoSync true
$ ipfs config --json Experimental.ShardingEnabled true
$ ipfs config --json Experimental.FilestoreEnabled true

$ time ipfs add -r --progress --offline --fscache --quieter --raw-leaves --nocopy /data/clojars

It's been running for an hour so far and considerably slower per Gb than #18

andrew@sd-48607:/data$ time ipfs add -r --progress --offline --fscache --quieter --raw-leaves --nocopy /data/clojars
badger 2019/03/11 11:28:06 INFO: All 1 tables opened in 0s
badger 2019/03/11 11:28:06 INFO: Replaying file id: 0 at offset: 46713
badger 2019/03/11 11:28:06 INFO: Replay took: 11.306µs
 11.73 GiB / 53.48 GiB [================================>---------------------------------------------------------------------------------------------------------------------]  21.94% 8h44m24s

dstat -tcdrnm --fs -pyi --ipc --lock output, doesn't look like its limited by server resources:

Screenshot 2019-03-11 at 12 42 05
andrew commented 5 years ago

The initial add completed successfully after 18 hours:

QmUZJaKrU2svAynm6xhCLJorgZw7N6z63KFk9KKPg3H9Je
badger 2019/03/12 05:43:44 INFO: Storing value log head: {Fid:3 Len:48 Offset:127565454}
badger 2019/03/12 05:43:47 INFO: Force compaction on level 0 done

real    1095m41.387s
user    27m17.851s
sys     17m27.049s

/data/.ipfs end up at 706M

andrew commented 5 years ago

Some thoughts from @warpfork on the dstat output:

I think maybe we should do some stracing on this stuff and see if there's some retardation with sillysmall buffers or such, e.g grep -E "read\((.*)= [0-9]$" theTrace

andrew commented 5 years ago

Updating the clojure repository and rsyncing again only took 25 minutes and didn't result error like #18 which is promising.

andrew@sd-48607:/data$ rsync --recursive --times --links --safe-links --hard-links --stats clojars.org::clojars /data/clojars
rsync: send_files failed to open "/aaron-santos/lwjgl/3.0.0rc1/.lwjgl-3.0.0rc1.jar.Xv9eFr" (in clojars): Permission denied (13)
rsync: send_files failed to open "/adalab/triple-loader/0.1.15/.triple-loader-0.1.15.jar.g8cWdp" (in clojars): Permission denied (13)
rsync: send_files failed to open "/afterglow/afterglow/0.1.0-SNAPSHOT/.afterglow-0.1.0-20150603.043103-39.jar.OsXkzI" (in clojars): Permission denied (13)
rsync: send_files failed to open "/crane/lein-crane/0.0.1-SNAPSHOT/.lein-crane-0.0.1-20101010.033133-3.jar.3hDZcq" (in clojars): Permission denied (13)

Number of files: 1,794,433 (reg: 1,634,919, dir: 159,514)
Number of created files: 575 (reg: 534, dir: 41)
Number of deleted files: 0
Number of regular files transferred: 687
Total file size: 57,475,179,087 bytes
Total transferred file size: 79,663,857 bytes
Literal data: 63,622,549 bytes
Matched data: 16,041,308 bytes
File list size: 5,333,753
File list generation time: 0.172 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 303,557
Total bytes received: 105,180,266

sent 303,557 bytes  received 105,180,266 bytes  180,160.24 bytes/sec
total size is 57,475,179,087  speedup is 544.87
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [generator=3.1.2]

andrew@sd-48607:/data$ time ipfs add -r --progress --offline --fscache --quieter --raw-leaves --nocopy /data/clojars
badger 2019/03/12 11:20:26 INFO: All 3 tables opened in 417ms
badger 2019/03/12 11:20:26 INFO: Replaying file id: 3 at offset: 127565502
badger 2019/03/12 11:20:26 INFO: Replay took: 18.023µs
 53.49 GiB / 53.51 GiB [================================================================================================================================]  99.96%QmZB7tK2JbVCHdDnLGCUN3Kp4rAbtvFbVcT7Qn9qRXt5QN
badger 2019/03/12 11:46:14 INFO: Storing value log head: {Fid:3 Len:48 Offset:128516796}
badger 2019/03/12 11:46:17 INFO: Force compaction on level 0 done

real    25m51.199s
user    11m34.639s
sys     2m20.987s
andrew commented 5 years ago

I've packaged this test up into a docker image so anyone else can easily replicate it: https://github.com/andrew/clojars-mirror-test

meiqimichelle commented 5 years ago

@andrew what's the status of this experiment?

andrew commented 5 years ago

This is pretty much done, the results will likely be the covered by the summary of #18