ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.03k stars 3k forks source link

Report: GC times and improvments from recent optimzations. #3462

Open kevina opened 7 years ago

kevina commented 7 years ago

Repos where created by doing the following:

1000x: random 4000 > afile && ipfs add afile 100x: random-files -q -dirs 4 -files 35 -depth 5 -random-fanout -random-size d && ipfs add -r d 100x: random-files -q -dirs 4 -files 35 -depth 5 -random-fanout -random-size d && ipfs add --pin=false -r d 10x: random-files -q -dirs 4 -files 100 -depth 7 -random-fanout -random-size d && ipfs add -r d 10x: random-files -q -dirs 4 -files 100 -depth 7 -random-fanout -random-size d && ipfs add --pin=false -r d 2x: random-files -q -dirs 4 -files 100 -depth 10 -random-fanout -random-size d && ipfs add -r d 2x: random-files -q -dirs 4 -files 100 -depth 10 -random-fanout -random-size d && ipfs add --pin=false -r d

Other notes:

Implemenation notes:

This creates a repo with around 1.6 million small blocks (under 4k), perfect for stress testing the pinner and filesystem. About 50% of the content is pinned.

The G.C. is done in basically two passes, the first pass collects the colored set, that is the set of blocks to keep, the second step scans the blockstore and removes any blocks in the set. I measured each pass seperatly. For the first pass the only thing that made and measurable difference is if raw leaves, which lead to a speedup of over an order of magnitude.

What Normal Leaves Raw Leaves Speedup
Get Colored Set (First Time) 282s 20.7s 13.6x
Get Colored Set (In Cache) 26.3s 2.1s 12.5x

For the second part when there are a significant amount of item's to blocks to delete the time spent deleting the files outweighs all other factors. In addition the time spend is highly depeneding on the filesystem used. For the xfs paratation I got different numbers depending on which repo. was used. The time was anywhere from 20 to 75 minutes. I have no idea what can be done about deletion time as long as we stick to one file per block. Increasing the prefix length in flatfs might help (there where on average 3,100 files per directory), but I have not investigated that.

After the initial GC was run and there was nothing to delete my optimizations has an effect on the second pass:

What Pre Opt Post Opt Speedup
Normal Leaves, First Time 5.7s 2.6s 2.2x
Normal Leaves, Second Time 5.8s 2.5s 2.3x
Raw Leaves, First Time 217s 4.4s 49x
Raw Leaves, Second Time 5.8s 2.2s 2.6x

For the case with normal leaves, since everything is pinned, the fpass to get the colored already loaded everything in the OS cache so the second time numbers are about the same as the first.

When raw leaves are used the story is very different. Only a small fraction of the blocks needed to be read from the store for the first pass so during the second pass there was still a lot of disk IO.

I double checked the first time results with raw leaves, there really is that large of a speed up after my optimizations. This is most likely due to the fact that after my optimizations the inodes for the files did not need to read from the disk in order to just list the blocks in the flatfs datastore.

In all cases when everything is in the os cache the speed up from my optimization is somewhere between 2x and 3x.

kevina commented 7 years ago

Actually I did discover there was a problem when using raw leaves here are some preliminary numbers:

Total time delete with normal leaves: 23 minutes Total time delete with raw leaves: 65 minutes

The problem is with raw leaves most of the blocks are in the AFZBE directory in the flatfs here is a count of some of the dirs with raw leaves:

AFZBE 1568285
CIQA2 100
CIQA3 78
CIQA4 68

and without raw leaves

CIQA2 3160
CIQA3 3173
CIQA4 3162
CIQA5 3160

I will report more later.

whyrusleeping commented 7 years ago

Ah, interesting. We're going to have to increase the shard width for cidV1... good catch.

kevina commented 7 years ago

Ah, interesting. We're going to have to increase the shard width for cidV1... good catch.

@whyrusleeping see #3463 and let's take the discussion on how to handle it over there.

kevina commented 7 years ago

After looking more closely at the deletion times and double checking the results I determined in one case having most of the files in a single directory helped runtime:

Total time delete with normal leaves: 23 minutes Total time delete with raw leaves (incorrect Cid): 65 minutes Total time delete with raw leaves (correct Cid): 8 minutes

Weird. I guess it is highly depending on how xfs chose to store the files in the large directory.

(Having so many files in a single directory also explains why ext4 was performing badly. I even managed to trigger the dir_index hash bug.)

whyrusleeping commented 7 years ago

@kevina It would be really nice to have a script that creates these files and runs the tests you were running and outputs timings. This would be a wonderful thing to be able to put into an automated performance regression test (cc @VictorBjelkholm ). Where we could run on a specific machine a set of tests to get performance numbers for each commit (or maybe a nightly checkpoint of master).

That said, these numbers look great! I'm really happy to have all of this in :)

kevina commented 7 years ago

@whyrusleeping here is the script to create the repo: https://gist.github.com/kevina/63efcab752d1a15e2fde6651b76d2a80

For the numbers I just used a hacked up version of IPFS.

rht commented 7 years ago

@kevina

Would be really really great if you could wrap the script in a repo. Someone needs to do this, as I'm kind of tired of waiting for arbitrary non-transparent approval from the members/those-with-commit-access, or that this has to be implemented by these people, otherwise it is "wrong" (apologize for this statement beforehand if it might not be according to any of the terms in code-of-conduct.md, but I think it is fair to say what I think is the case while remaining respectful, and to be objectively corrected if it is not the case). I have several local benchmark-scripts that I could put in. There is also the network bitswap duplicate benchmark by @ion1 buried somewhere. Benchmarks are means of structured progress (see e.g. the NIST hash function competition for hash function, Stanford Question Answering Dataset benchmark for AI reading comprehension, or http://www.spacex.com/hyperloop for ... hyperloop), where anyone could objectively verify for veracity, and where it reduces reliance on centralized reviews, for 1. the maintainers might sometimes be wrong, 2. this reduces their workload through automaton. Finally, regardless of how I stated this all, I hope there is a comprehensive benchmark suite, for this reason (I did not think of this leveling use case when I stated it was redundant with natural datasets test case in https://github.com/ipfs/go-ipfs/issues/3621#issuecomment-275830743).

whyrusleeping commented 7 years ago

@rht If you want to help out with automated performance benchmarks you should take part in our ongoing CI/CD sprint. It would be great to collect these various benchmarks together and have them be automatically run for each new PR.

rht commented 7 years ago

(I am in fact in the middle of curating all these scattered benchmarks into https://github.com/rht/sfpi-benchmark)