Cache RawData for Commit, Tag, & Tree, fixes #6

sameer commented 5 years ago

A rawdata []byte member is added to each type. It is computed on the first call to RawData(). Also noticed a few odd indentations and fixed them.

Let me know if there are any changes that need to be made, feedback is welcome.

sameer commented 5 years ago

Sure, should that be done as a *testing.B in git_test.go, or should I just try it out against the linux kernel or something with a before & after?

magik6k commented 5 years ago

Whatever works best for you and you think should show a difference is fine.

sameer commented 5 years ago

So I tried to run it against the go-ipfs repo and interestingly it crashes the ipfs daemon with 17:18:41.589 ERROR flatfs: too many open files, retrying in 100ms flatfs.go:378.

Instead I ran it against just ipfs/ipfs and got the following: After:

real    0m3.028s
user    0m0.610s
sys     0m0.304s

Before (rm -rf'ed .git/ipld):

real    0m3.961s
user    0m0.522s
sys     0m0.281s

I can run it a few more times just to make sure.

sameer commented 5 years ago

Ok, I had a chance to do more comprehensive testing. All the files in the repo are already added to my ipfs daemon, so they should return fast. The bench script just does rm -rf .git/ipld/ && git push ipld:: master, and I ran it with perf stat -r 100 ./bench.sh. Before:

 Performance counter stats for './bench.sh' (100 runs):

            554.14 msec task-clock:u              #    0.623 CPUs utilized            ( +-  0.85% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             2,680      page-faults:u             # 4840.519 M/sec                    ( +-  1.63% )
     1,121,182,658      cycles:u                  # 2025221.108 GHz                   ( +-  0.29% )
     1,280,351,427      instructions:u            #    1.14  insn per cycle           ( +-  0.13% )
       227,040,471      branches:u                # 410109049.439 M/sec               ( +-  0.14% )
         4,581,354      branch-misses:u           #    2.02% of all branches          ( +-  0.32% )

            0.8896 +- 0.0205 seconds time elapsed  ( +-  2.31% )

After:

 Performance counter stats for './bench-fast.sh' (100 runs):

            567.96 msec task-clock:u              #    0.646 CPUs utilized            ( +-  0.86% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             2,678      page-faults:u             # 4719.887 M/sec                    ( +-  1.80% )
     1,131,191,140      cycles:u                  # 1993463.988 GHz                   ( +-  0.33% )
     1,281,825,576      instructions:u            #    1.13  insn per cycle           ( +-  0.14% )
       227,317,086      branches:u                # 400594036.690 M/sec               ( +-  0.16% )
         4,627,311      branch-misses:u           #    2.04% of all branches          ( +-  0.31% )

            0.8797 +- 0.0203 seconds time elapsed  ( +-  2.31% )

There doesn't really seem to be any difference when I did this test. Is there some better way to do one?

magik6k commented 5 years ago

Most of this will be I/O. I'd use testing.B, load test data like in https://github.com/ipfs/go-ipld-git/blob/master/git_test.go#L22, and then call RawData in a loop b.N times

sameer commented 5 years ago

Ok, I've written a BenchmarkRawData and ran it against master and the PR. After:

[purisame@arbeitslosigkeit go-ipld-git]$ go test -bench BenchmarkRawData -count 10
goos: linuxs/80/2992c4220de19a90767f3000a79a31b98d0df7fa
goarch: amd64
BenchmarkRawData-4           500           2797081 ns/op
BenchmarkRawData-4           500           2822694 ns/op
BenchmarkRawData-4           500           2896903 ns/op
BenchmarkRawData-4           500           2824651 ns/op
BenchmarkRawData-4           500           3189323 ns/op
BenchmarkRawData-4           500           2895799 ns/op
BenchmarkRawData-4           500           2917160 ns/op
BenchmarkRawData-4           500           2975555 ns/op
BenchmarkRawData-4           500           2938396 ns/op
BenchmarkRawData-4           500           3065296 ns/op
PASS
ok      _/home/purisame/go-ipld-git     17.654s

Before:

[purisame@arbeitslosigkeit go-ipld-git]$ go test -bench BenchmarkRawData -count 10
goos: linuxs/80/2992c4220de19a90767f3000a79a31b98d0df7fa
goarch: amd64
BenchmarkRawData-4           500           3112162 ns/op
BenchmarkRawData-4           500           3089852 ns/op
BenchmarkRawData-4           500           3273750 ns/op
BenchmarkRawData-4           500           3093242 ns/op
BenchmarkRawData-4           500           3080540 ns/op
BenchmarkRawData-4           500           3055796 ns/op
BenchmarkRawData-4           500           3101081 ns/op
BenchmarkRawData-4           500           2995909 ns/op
BenchmarkRawData-4           500           3049960 ns/op
BenchmarkRawData-4           500           3011387 ns/op
PASS
ok      _/home/purisame/go-ipld-git     18.589s

sameer commented 5 years ago

Are there any other changes I should make for this PR?

ipfs / go-ipld-git

Cache RawData for Commit, Tag, & Tree, fixes #6 #28