src-d / go-git

Project has been moved to: https://github.com/go-git/go-git
https://github.com/go-git/go-git
Apache License 2.0
4.9k stars 540 forks source link

As a server Shortcuts/Optimisations? #627

Open freman opened 7 years ago

freman commented 7 years ago

Hi

I'm trying to write a caching proxy server that we can use to cache locally various repos from github (and other places) that we use heavily but I'm running into a performance issue

I'm more than happy to concede that we won't beat github for speed but I'm finding this to be a great deal slower.

This is a greatly simplified version of what I'm running in the main codebase

package main

import (
    "compress/gzip"
    "io"
    "net/http"
    "os"
    "path"
    "strings"

    "gopkg.in/src-d/go-git.v4/plumbing/protocol/packp"
    "gopkg.in/src-d/go-git.v4/plumbing/transport"
    "gopkg.in/src-d/go-git.v4/plumbing/transport/server"
)

func main() {
    wd, _ := os.Getwd()
    http.ListenAndServe(":8822", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Cache-Control", "no-cache")

        s := strings.SplitN(strings.TrimLeft(r.URL.Path, "/"), "/", 2)
        ep, _ := transport.NewEndpoint(path.Join("file://", wd, s[0]))
        ups, _ := server.DefaultServer.NewUploadPackSession(ep, nil)
        if strings.Contains(r.URL.Path, "info") {
            advs, _ := ups.AdvertisedReferences()
            advs.Prefix = [][]byte{
                []byte("# service=git-upload-pack"),
                []byte(""),
            }
            w.Header().Set("Content-Type", "application/x-git-upload-pack-advertisement")
            advs.Encode(w)
            return
        }
        defer r.Body.Close()
        var rdr io.ReadCloser = r.Body

        if r.Header.Get("Content-Encoding") == "gzip" {
            rdr, _ = gzip.NewReader(r.Body)
        }

        upakreq := packp.NewUploadPackRequest()
        upakreq.Decode(rdr)

        up, _ := ups.UploadPack(r.Context(), upakreq)
        w.Header().Set("Content-Type", "application/x-git-upload-pack-result")
        up.Encode(w)
    }))
}

if you clone the aws/aws-sdk-go repo git clone --quiet --mirror https://github.com/aws/aws-sdk-go aws-sdk-go into the same directory as you put this go file

Then go run main.go you can do the following tests

$ time git clone --quiet https://github.com/aws/aws-sdk-go

real    0m23.816s
user    0m4.047s
sys 0m1.289s
$ time git clone --quiet http://localhost:8822/aws-sdk-go

real    2m18.718s
user    0m3.793s
sys 0m1.165s

The time cost here is entirely in cloning from scratch, pulling seems plenty fast.

I did profile my code and I found it spent most of it's time in encoder.go image

Can anyone think of any way to shortcut this process for a clone if not optimise the code?

If instead of using git clone --mirror to create the aws-sdk-dir and I use go-git to clone it I even get pre-packed files

./objects/pack/pack-41174c775d8b7f517d5db3c20d52b0e5379fe9de.idx
./objects/pack/pack-41174c775d8b7f517d5db3c20d52b0e5379fe9de.pack

Perhaps for a fresh clone I can just ship that?

freman commented 7 years ago

Out of curiosity I tested piping to and from git-upload-pack

$ time git clone --quiet http://localhost:8822/aws-sdk-go

real    0m3.545s
user    0m3.573s
sys 0m0.790s

This is the result I was kinda hoping for, but I'd still be happy with githubish speed

mcuadros commented 7 years ago

The problem is that the packfile is being calculated and all the deltas, and this is expensive operation.

mcuadros commented 6 years ago

Just a bit more information about the evolution of the problem:

The baseline a git server local server serving the example repository of aws-sdk-go, executed with git daemon --verbose --base-path=/tmp --export-all /tmp/aws-sdk-go

git clone

Baseline, git daemon (0:04.80elapsed)

time  git clone git://localhost/aws-sdk-go                                                                                                                                        mcuadros@mcuadros-xps-arch
Cloning into 'aws-sdk-go'...
remote: Counting objects: 43180, done.
remote: Compressing objects: 100% (13799/13799), done.
remote: Total 43180 (delta 25368), reused 43176 (delta 25366)
Receiving objects: 100% (43180/43180), 47.17 MiB | 29.56 MiB/s, done.
Resolving deltas: 100% (25368/25368), done.
9.01user 0.40system 0:04.80elapsed 196%CPU (0avgtext+0avgdata 142104maxresident)k
816inputs+251424outputs (0major+48256minor)pagefaults 0swaps

After #697 (1:10.97elapsed)

 time git clone http://localhost:8080/aws-sdk-go                                                                                                                                               mcuadros@mcuadros-xps-arch
Cloning into 'aws-sdk-go'...
Receiving objects: 100% (43180/43180), 43.85 MiB | 3.61 MiB/s, done.
Resolving deltas: 100% (26823/26823), done.
9.35user 0.50system 1:10.97elapsed 13%CPU (0avgtext+0avgdata 159668maxresident)k
0inputs+0outputs (0major+50427minor)pagefaults 0swaps

Before #697 (2:28.55elapsed)

time git clone http://localhost:8080/aws-sdk-go                                                                                                                                                    mcuadros@mcuadros-xps-arch
Cloning into 'aws-sdk-go'...
Receiving objects: 100% (43180/43180), 55.80 MiB | 3.65 MiB/s, done.
Resolving deltas: 100% (24464/24464), done.
9.82user 0.63system 2:28.55elapsed 7%CPU (0avgtext+0avgdata 135676maxresident)k
0inputs+0outputs (0major+36211minor)pagefaults 0swaps

git fetch origin v0.6.0

Baseline, git daemon (0:00.98elapsed):

 time git fetch origin v0.6.0                                                                                                                                                     mcuadros@mcuadros-xps-arch
remote: Counting objects: 12247, done.
remote: Compressing objects: 100% (4166/4166), done.
remote: Total 12247 (delta 6741), reused 12233 (delta 6741)
Receiving objects: 100% (12247/12247), 8.89 MiB | 27.17 MiB/s, done.
Resolving deltas: 100% (6741/6741), done.
From git://localhost/aws-sdk-go
 * tag               v0.6.0     -> FETCH_HEAD
1.52user 0.09system 0:00.98elapsed 165%CPU (0avgtext+0avgdata 9148maxresident)k
0inputs+18896outputs (0major+4389minor)pagefaults 0swaps

After #697 (0:11.95elapsed)

time git fetch origin v0.6.0                                                                                                                                                                       mcuadros@mcuadros-xps-arch
Receiving objects: 100% (12247/12247), 8.13 MiB | 2.62 MiB/s, done.
Resolving deltas: 100% (7324/7324), done.
From http://localhost:8080/aws-sdk-go
 * tag               v0.6.0     -> FETCH_HEAD
1.65user 0.14system 0:11.95elapsed 15%CPU (0avgtext+0avgdata 9888maxresident)k
0inputs+0outputs (0major+5590minor)pagefaults 0swaps

Before #697 (0:19.10elapsed)

time git fetch origin v0.6.0                                                                                                                                                                       mcuadros@mcuadros-xps-arch
Receiving objects: 100% (12247/12247), 8.38 MiB | 3.01 MiB/s, done.
Resolving deltas: 100% (6967/6967), done.
From http://localhost:8080/aws-sdk-go
 * tag               v0.6.0     -> FETCH_HEAD
1.63user 0.08system 0:19.10elapsed 8%CPU (0avgtext+0avgdata 12980maxresident)k
0inputs+0outputs (0major+5480minor)pagefaults 0swaps