src-d / go-git

Project has been moved to: https://github.com/go-git/go-git
https://github.com/go-git/go-git
Apache License 2.0
4.91k stars 542 forks source link

CommitObjects is slow compared to equivalent git rev-list --all #1294

Open jatindhankhar opened 4 years ago

jatindhankhar commented 4 years ago

I am trying to write a custom git grep wrapper using go-git.

Essentially trying to replicate

git rev-list --all | xargs git --no-pager grep -i 'search_text'

CommitObjects() is slow compared to the command git rev-list --all

To benchmark it, I used a big repo, (https://github.com/odoo/odoo/) with large number of commits.

I understand there would be some overheard due to creation of custom objects created to support various operations, but the current implementation of CommitObjects is 16 times slower than the raw command.

The original strange thing I noticed that the go implementation would freeze for few seconds after reaching following commit 004a0b996ff8f269451e07346f71a129a1f3fbaf then list out remaining ~ 18-20 commits.

main.go

package main

import (
    "fmt"
    "gopkg.in/src-d/go-git.v4/plumbing/object"
)
import "gopkg.in/src-d/go-git.v4"

func main() {
    r, err := git.PlainOpen("odoo")
    if err == nil {
        bs, _ := r.CommitObjects()
        bs.ForEach(func(ref *object.Commit) error {
            fmt.Println(ref.Hash)
            return nil
        })
    } else
    {
        fmt.Println(err.Error())
    }
}
# go-git wrapper
./main  16.24s user 11.88s system 103% cpu 27.224 total

# raw command 
git rev-list --all  1.67s user 0.32s system 81% cpu 2.456 total

Screenshot 2020-02-28 at 2 48 40 PM

I used Hyperfine(https://github.com/sharkdp/hyperfine) to run a more standard benchmark than the time command and result is same.

hyperfine --min-runs 5 './main' 'git rev-list --all'

Benchmark #1: ./main
  Time (mean ± σ):     28.729 s ±  2.729 s    [User: 15.574 s, System: 12.378 s]
  Range (min … max):   25.745 s … 32.868 s    5 runs

Benchmark #2: git rev-list --all
  Time (mean ± σ):      1.413 s ±  0.163 s    [User: 1.174 s, System: 0.171 s]
  Range (min … max):    1.331 s …  1.704 s    5 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (1.704 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Summary
  'git rev-list --all' ran
   20.33 ± 3.04 times faster than './main'

Screenshot 2020-02-28 at 2 55 02 PM


Profiling code

package main

import (
    "fmt"
    "github.com/pkg/profile"
    "gopkg.in/src-d/go-git.v4/plumbing/object"
)
import "gopkg.in/src-d/go-git.v4"

func main() {
    defer profile.Start().Stop()
    r, err := git.PlainOpen("odoo")
    if err == nil {
        bs, _ := r.CommitObjects()
        bs.ForEach(func(ref *object.Commit) error {
            fmt.Println(ref.Hash)
            return nil
        })
    } else
    {
        fmt.Println(err.Error())
    }
}

Profile output

cpu_profiling.pdf

Am I missing something ?

Is there a more performant way of iterating commits ?

P.S. Benchmark was performed on a 2017 MBP

  Model Name:   MacBook Pro
  Model Identifier: MacBookPro14,1
  Processor Name:   Dual-Core Intel Core i5
  Processor Speed:  2.3 GHz
  Number of Processors: 1
  Total Number of Cores:    2
  L2 Cache (per Core):  256 KB
  L3 Cache: 4 MB
  Hyper-Threading Technology:   Enabled
  Memory:   8 GB