src-d / go-git

Project has been moved to: https://github.com/go-git/go-git
https://github.com/go-git/go-git
Apache License 2.0
4.91k stars 541 forks source link

High memory usage iterating over commits of large repository (store on filesystem) #1244

Open patrickdevivo opened 4 years ago

patrickdevivo commented 4 years ago

Hi there, I apologize ahead of time if this issue is not directly related to a problem within this codebase, I'm unsure of the source of the problem I'm experiencing and greatly appreciate any guidance!

I'm working on a project (https://github.com/augmentable-dev/tickgit) that traverses the git history of a repository looking for when certain lines of source were added (TODO items), using the iterators provided by go-git. For smaller repositories, everything works great. For larger ones, such as https://github.com/torvalds/linux (reading from a clone on my filesystem), I see extremely high memory consumption when iterating through commits (and inspecting their trees). I assume a memory leak...but am having trouble identifying the source - I'm a profiling n00b!

profile002

gopkg.in/src-d/go-git.v4/plumbing/format/idxfile.(*MemoryIndex).genOffsetHash
/Users/patrickdevivo/go/pkg/mod/gopkg.in/src-d/go-git.v4@v4.13.1/plumbing/format/idxfile/idxfile.go

  Total:       510MB      510MB (flat, cum) 59.24%
    199            .          .             count, err := idx.Count() 
    200            .          .             if err != nil { 
    201            .          .                 return err 
    202            .          .             } 
    203            .          .            
    204        510MB      510MB             idx.offsetHash = make(map[int64]plumbing.Hash, count) 
    205            .          .             idx.offsetHashIsFull = true 
    206            .          .            
    207            .          .             var hash plumbing.Hash 
    208            .          .             i := uint32(0) 
    209            .          .             for firstLevel, fanoutValue := range idx.Fanout { 

I've gone through my code several times now looking for instances where I might be unnecessarily holding onto memory, but am wondering if there's perhaps some case of me misusing the library or not cleaning something up. I'm pretty sure I'm closing all my file readers, is this something anyone else has encountered and may know where the issue lies? Thanks so much