Open alyssais opened 7 years ago
@alyssais hi! Thanks for reporting this. I think what we're hitting here is a cache increase. libgit2 keeps a cache of diffs for each call to diff
. The life of that cache is the same length as the Rugged::Repository
object, so in the case of your sample program the cache life will be the same as the life of the program. The default maximum cache size seems rather large, and unfortunately Rugged doesn't give us any insight in to the cache usage. I've opened #696 so we can see from Ruby land what the cache size / usage is. libgit2 provides API for tuning (or disabling) that cache, but I did't add API for that.
@kivikakk was able to reproduce the leak with a script based on your script, and I've pasted it below:
require "rugged"
def diff_parents(repo, commit)
tbd = [commit]
t = 0
while tbd.length > 0
commit = tbd.shift
t += 1
puts "#{t} #{tbd.length}" if t % 100 == 0
commit.parents.each { |parent| parent.diff(commit); tbd << parent }
tbd = tbd[0..100]
end
end
repo = Rugged::Repository.new("rails")
diff_parents repo, repo.head.target
I found that if I created a new Repository object every once in a while, I could get the memory usage to plateau:
require "rugged"
def diff_parents(repo, commit)
tbd = [commit]
t = 0
while tbd.length > 0
commit = tbd.shift
t += 1
puts "#{t} #{tbd.length}" if t % 100 == 0
commit.parents.each { |parent| parent.diff(commit); tbd << parent }
tbd = tbd[0..100]
if t % 100 == 0 # Limit repo cache size
tbd.map!(&:oid)
repo = Rugged::Repository.new repo.path
tbd.map! { |id| repo.lookup id }
end
end
end
repo = Rugged::Repository.new("rails")
diff_parents repo, repo.head.target
Memory usage should already plateau around this size plus whatever other memory Ruby is using. Do you ever see a plateau? If you rework your program to create new Repository objects does it reduce consumption (I know allocating new repo objects is a huge hack 🙇)?
Thanks in advance!
Note: the reason I think it's a cache issue is we're not seeing any leaked memory reported from leaks
or valgrind
.
Hi @tenderlove, thanks for investigating for me! I don't think I've ever seen a plateau, although it's possible since the cache size is so big that I just wouldn't have noticed it! I'll run a comparison with the creating new repository objects and report back.
Something else that's just occurred to me — there's a Repository#close
method that looks interesting, now that it appears to be a caching problem:
Frees all the resources used by this repository immediately. The repository can still be used after this call. Resources will be opened as necessary.
It is not required to call this method explicitly. Repositories are closed automatically before garbage collection
I wonder, would that affect this cache? Looks like it might — this is the libgit2 function it calls. I'll run a test where I occasionally call that, too.
With no changes, my program currently peaks at ~1.4GB memory, of which ~232MB is apparently cache.
I tried calling #close
on the repo after every time I generated a diff just to see what impact it made. Memory use never rose above 1.2GB, and, more importantly, would consistently decrease to <300MB — I'm not worried about the high memory use, since the demands of my program are pretty intensive, it's the fact that the minimum would creep up over time that worried me, so I think this was what was happening.
I didn't try creating new repository instances because adapting my program to do that would be significantly more difficult — the whole execution of the program is spent iterating a walker attached to a repository instance, so I'd have to keep track of a lot of walker state and rebuild it every so often or the repository would just be retained. Since calling #close
seems to have had the same effect, I didn't think it was worth it, but will have a go at it if you think it would make a difference.
Thanks for your help! I spent quite a while on your blog yesterday while I was trying to track this down! :D
consistently decrease to <300MB
This is great!!
Since calling #close seems to have had the same effect, I didn't think it was worth it, but will have a go at it if you think it would make a difference.
I don't think so. I didn't know that close
would impact the cache, but after reading the libgit2 code, I think you're absolutely right. If calling close
on the repo keeps your memory issues under control, then please stick with it! In the mean time, I'll try to add functions to Rugged that allow you to tweak the cache settings so that you don't have to change your code. Though I guess the caveat is that the cache settings are global, so changing them would impact all repository objects in the same process.
Thanks for your help! I spent quite a while on your blog yesterday while I was trying to track this down! :D
You're very welcome! I hope it was of some help, though with the opaqueness of this cache, I'm not so sure. 😄
It might be worth considering renaming (or at least aliasing) #close
, too? It's a really non-obvious name, to me at least — especially the fact that the repository object is still perfectly usable after calling it.
I agree. close
really sounds like you won't be able to use the object anymore, like on a file. Looks like the libgit2 api calls it cleanup
, but I'm not totally sure what the function does (specifically the set_*
calls) so I don't know if there is a better name. Do you have any suggestions?
/cc @carlosmn @arthurschreiber
I was going to suggest cleanup
, since that's what libgit2 seems to call it, or clean
, maybe.
Bah, even though I'm calling #clean
periodically, I'm still getting a memory buildup over time.
Here's a graph (y-axis is %mem used by the script, x-axis is seconds):
3 +-+---------+----------+-----------+-----------+----------+---------+-+
+ + + + + *** +
| ****** |
2.5 +-+ **************** +-+
| ******************* |
| ***** |
2 +-************ +-+
|** |
| |
1.5 +-+ +-+
| |
| |
| |
1 +-+ +-+
| |
| |
0.5 +-+ +-+
| |
+ + + + + + +
0 +-+---------+----------+-----------+-----------+----------+---------+-+
0 20 40 60 80 100 120
Here's how that graph is generated (using feedgnuplot
):
while true; do ps -o %mem= <PID>; sleep 1; done | feedgnuplot --lines --stream --terminal 'dumb 80,24' --extracmds 'unset grid' --ymin 0 --xmin 0
And here's the test script I'm using:
require "rugged"
puts "PID: #$$"
path = "#{__dir__}/rails"
if File.directory?(path)
repo = Rugged::Repository.new(path)
else
print "Cloning rails…"
repo = Rugged::Repository.clone_at("https://github.com/rails/rails", path)
puts " done"
end
puts "Creating walk array"
walk = repo.walk(repo.head.target, Rugged::SORT_TOPO).to_a
walk.each.with_index do |commit, i|
commit.parents.each do |parent|
diff = repo.diff(parent, commit, context_lines: 0)
end
if i % 1000 == 0
puts "#{i} commits done. Cleaning."
repo.close
GC.start
end
end
Running this script through memory_profiler tells me no Ruby objects are retained between iterations.
I'll try running even more diffs and seeing if the memory continues to increase or if it plateaus.
I've also tried modifying it to create new repository objects every so often, and that appears to have no difference compared to calling #close
.
@tenderlove I think the idea behind calling the method #close
is to mirror IO#close
.
I've also tried modifying it to create new repository objects every so often, and that appears to have no difference compared to calling #close.
Ugh. I'll try working with this program and see what I can find. Are you seeing any GC stats rising? Specifically GC.stat(:heap_sorted_length)
and GC.stat(:heap_live_slots)
.
@tenderlove I think the idea behind calling the method #close is to mirror IO#close.
@arthurschreiber I guess the difference is that IO#close
makes the file unusable after you call it where Repository#close
allows you to continue using the repo object.
irb(main):012:0> f = File.open 'Rakefile'
=> #<File:Rakefile>
irb(main):013:0> f.close
=> nil
irb(main):014:0> f.read
IOError: closed stream
from (irb):14:in `read'
from (irb):14
from /Users/aaron/.rbenv/versions/ruby-trunk/bin/irb:11:in `<main>'
irb(main):015:0>
Are you seeing any GC stats rising? Specifically
GC.stat(:heap_sorted_length)
andGC.stat(:heap_live_slots)
.
heap_sorted_length
is stable at 300, and heap_live_slots
is stable at 78594±1.
count | heap_allocated_pages | heap_sorted_length | heap_allocatable_pages | heap_available_slots | heap_live_slots | heap_free_slots | heap_final_slots | heap_marked_slots | heap_eden_pages | heap_tomb_pages | total_allocated_pages | total_freed_pages | total_allocated_objects | total_freed_objects | malloc_increase_bytes | malloc_increase_bytes_limit | minor_gc_count | major_gc_count | remembered_wb_unprotected_objects | remembered_wb_unprotected_objects_limit | old_objects | old_objects_limit | oldmalloc_increase_bytes | oldmalloc_increase_bytes_limit |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14 | 232 | 300 | 68 | 94563 | 78593 | 15970 | 0 | 78592 | 232 | 0 | 232 | 0 | 159838 | 81245 | 448 | 16777216 | 11 | 3 | 62396 | 124792 | 16096 | 32192 | 2354880 | 16777216 |
15 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 170301 | 91706 | 448 | 16777216 | 11 | 4 | 62396 | 124792 | 16096 | 32192 | 896 | 16777216 |
16 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 180666 | 102072 | 448 | 16777216 | 11 | 5 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
17 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 191360 | 112765 | 448 | 16777216 | 11 | 6 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
18 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 201935 | 123341 | 448 | 16777216 | 11 | 7 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
19 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 212706 | 134112 | 448 | 16777216 | 11 | 8 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
20 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 223218 | 144624 | 448 | 16777216 | 11 | 9 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
21 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 233842 | 155248 | 448 | 16777216 | 11 | 10 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
22 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 243486 | 164891 | 448 | 16777216 | 11 | 11 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
23 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 253963 | 175369 | 448 | 16777216 | 11 | 12 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
24 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 264034 | 185440 | 448 | 16777216 | 11 | 13 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
25 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 274532 | 195938 | 448 | 16777216 | 11 | 14 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
26 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 285226 | 206632 | 448 | 16777216 | 11 | 15 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
27 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 295584 | 216989 | 448 | 16777216 | 11 | 16 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
28 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 305697 | 227102 | 448 | 16777216 | 11 | 17 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
29 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 316153 | 237559 | 448 | 16777216 | 11 | 18 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
30 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 326119 | 247525 | 448 | 16777216 | 11 | 19 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
31 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 336127 | 257532 | 448 | 16777216 | 11 | 20 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
32 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 346653 | 268059 | 448 | 16777216 | 11 | 21 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
33 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 356878 | 278284 | 448 | 16777216 | 11 | 22 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
34 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 366991 | 288396 | 448 | 16777216 | 11 | 23 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
35 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 377132 | 298538 | 448 | 16777216 | 11 | 24 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
36 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 387252 | 308658 | 448 | 16777216 | 11 | 25 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
37 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 397085 | 318491 | 448 | 16777216 | 11 | 26 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
38 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 407408 | 328814 | 448 | 16777216 | 11 | 27 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
39 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 417381 | 338787 | 448 | 16777216 | 11 | 28 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
40 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 427529 | 348935 | 448 | 16777216 | 11 | 29 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
41 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 437390 | 358796 | 448 | 16777216 | 11 | 30 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
42 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 447153 | 368558 | 448 | 16777216 | 11 | 31 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
43 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 456671 | 378076 | 448 | 16777216 | 11 | 32 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
44 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 466406 | 387811 | 448 | 16777216 | 11 | 33 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
45 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 476057 | 397462 | 448 | 16777216 | 11 | 34 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
46 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 485715 | 407121 | 448 | 16777216 | 11 | 35 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
47 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 495548 | 416954 | 448 | 16777216 | 11 | 36 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
48 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 505528 | 426933 | 448 | 16777216 | 11 | 37 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
49 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 515375 | 436781 | 448 | 16777216 | 11 | 38 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
50 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 524977 | 446383 | 448 | 16777216 | 11 | 39 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
51 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 534628 | 456034 | 448 | 16777216 | 11 | 40 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
52 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 544139 | 465545 | 448 | 16777216 | 11 | 41 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
53 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 553769 | 475175 | 448 | 16777216 | 11 | 42 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
54 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 563553 | 484958 | 448 | 16777216 | 11 | 43 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
55 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 572294 | 493700 | 448 | 16777216 | 11 | 44 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
56 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 580615 | 502021 | 448 | 16777216 | 11 | 45 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
57 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 588908 | 510314 | 448 | 16777216 | 11 | 46 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
58 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 597117 | 518523 | 448 | 16777216 | 11 | 47 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
59 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 605522 | 526928 | 448 | 16777216 | 11 | 48 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
60 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 613906 | 535312 | 448 | 16777216 | 11 | 49 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
61 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 622290 | 543696 | 448 | 16777216 | 11 | 50 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
62 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 630660 | 552066 | 448 | 16777216 | 11 | 51 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
63 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 639149 | 560555 | 448 | 16777216 | 11 | 52 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
64 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 647799 | 569205 | 448 | 16777216 | 11 | 53 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
65 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 656092 | 577498 | 448 | 16777216 | 11 | 54 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
66 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 664630 | 586036 | 448 | 16777216 | 11 | 55 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
67 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 673112 | 594518 | 448 | 16777216 | 11 | 56 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
68 | 232 | 300 | 68 | 94563 | 78595 | 15968 | 0 | 78594 | 232 | 0 | 232 | 0 | 681727 | 603132 | 448 | 16777216 | 11 | 57 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
69 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 690195 | 611601 | 448 | 16777216 | 11 | 58 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
70 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 698628 | 620034 | 448 | 16777216 | 11 | 59 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
71 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 706662 | 628068 | 448 | 16777216 | 11 | 60 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
72 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 714696 | 636102 | 448 | 16777216 | 11 | 61 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
73 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 722730 | 644136 | 448 | 16777216 | 11 | 62 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
74 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 730764 | 652170 | 448 | 16777216 | 11 | 63 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
75 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 738798 | 660204 | 448 | 16777216 | 11 | 64 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
76 | 232 | 300 | 68 | 94563 | 78594 | 15969 | 0 | 78593 | 232 | 0 | 232 | 0 | 746832 | 668238 | 448 | 16777216 | 11 | 65 | 62397 | 124794 | 16171 | 32342 | 896 | 16777216 |
A graph showing the same as the previous one, but with a much bigger repo (torvalds/linux — about 10x the number of commits). It's less noticeable because I loaded so many commits into memory up front, but there's still a definite trend upwards. The big drop near the start is when another process on my computer started spinning up a lot and a lot of memory got written to swap, which must have interfered with my graph somehow.
20 +-+-------+---------+---------+----------+---------+---------+-------+-+
+ + + + + + + * +
18 +-+ ***+-+
* *** *** * |
16 *-+ ***** * ************ *+-+
* ** ********** ********* * * |
14 *-+ ** * ************ *** *+-+
* ************ ***** * * |
12 *** *** * * *+-+
10 **+ *** *+-+
** ** * |
8 **+ ** *+-+
** ** * |
6 **+ ** *+-+
** ** * |
4 **+ ** *+-+
** ** * |
2 **+ * *+-+
* + + + + + + * +
0 *-+-------+---------+---------+----------+---------+---------+------*+-+
0 20000 40000 60000 80000 100000 120000 140000
Awesome, thanks for the graph. I just wanted to comment and say I haven't forgotten about this, just been busy with life stuff (RailsConf last week, moving this week). I'll continue poking at this on Thursday. I believe @kivikakk was able to write a repro in C, so we might be able to eliminate Ruby from the equation.
No worries. Thanks @tenderlove! 🙏
@tenderlove unfortunately it was a non-repro in C 😞 I couldn't observe the ballooning memory usage without Ruby involved.
Hi again,
Sorry it's been so long 😞. I think I've reproduced the problem, but I'm not sure what to believe. I'm going to describe my test methodology here a) so that I don't forget and b) so other people can try.
I'm using trunk Ruby compiled with -g
so I have debugging symbols (I don't believe trunk Ruby is required, and neither is -g
but the debugging symbols make life easier). I'm running rugged from master, and I recompiled it there so it picks up the -g
flags too.
This is the script I'm using:
require "rugged"
puts "PID: #$$"
repo = Rugged::Repository.new(ARGV[0])
OID = 'df5ada78c0b0e962d65e89e7487fd35289dc9157'
GC.start
GC.start
GC.start
puts "Creating walk array"
walk = repo.walk(repo.head.target, Rugged::SORT_TOPO).to_a
walk.each.with_index do |commit, i|
the_rents = commit.parents
the_rents.each do |parent|
diff = repo.diff(parent, commit, context_lines: 0)
end
if i % 1000 == 0
puts "#{i} commits done. Cleaning."
repo.close
commit = nil
GC.start
$stdin.gets
end
end
The way I'm running the script is like this:
$ pwd
/Users/aaron/github/rugged
$ MallocScribble=1 MallocStackLoggingNoCompact=1 /Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby --disable-gems -Ilib memgrow.rb ~/git/rails
MallocScribble
isn't necessary, but MallocStackLoggingNoCompact
is. Setting that environment variable records all calls to malloc
regardless of requested size, along with the stack trace of the malloc call. When the process stops for some input, I run this command in a different terminal:
$ malloc_history -allBySize $PID > $SOME_FILE
Where $PID
is the pid from the memgrow.rb
file, and $SOME_FILE
is a text file and I change the file name each time I run malloc_history
. malloc_history -allBySize
outputs stacks and sizes for all live allocations. So everything that is printed from that command should be live memory. So I'll record all live memory in $SOME_FILE
, then hit enter a few times (in this case 2 times), then run the malloc_history
command again and record live allocations in a new file. That way I can get an idea of what is live and changing over time.
I wrote a diff script to figure out what changed between two recordings:
def read_live_objects file
f = File.open file
while f.readline.chomp != '----'
end
f.readline
list = []
while line = f.readline
case line
when "Binary Images:\n"
return list.group_by(&:last).transform_values { |v| v.map(&:first) }
when "\n"
else
list << line.chomp.split(':', 2)
end
end
ensure
f.close if f
end
def total_allocations allocs
allocs.values.flatten.map { |v| v[/\d+ bytes/].to_i }.inject(:+)
end
def diff_allocations from, to
puts "TOTAL HEAP SIZE (bytes) #{total_allocations(from)} => #{total_allocations(to)}"
total_allocations from
puts
puts "NEW STACKS"
(to.keys - from.keys).each do |key|
to[key].each do |count|
p count => key
end
end
puts
puts "REMOVED STACKS"
(from.keys - to.keys).each do |key|
from[key].each do |count|
p count => key
end
end
puts
puts "CHANGED STACKS"
(from.keys & to.keys).each do |key|
if from[key].sort != to[key].sort
bytes_from = from[key].flatten.map { |v| v[/\d+ bytes/].to_i }.inject(:+)
bytes_to = to[key].flatten.map { |v| v[/\d+ bytes/].to_i }.inject(:+)
p( (bytes_to - bytes_from) => key )
end
end
end
from = read_live_objects ARGV[0]
to = read_live_objects ARGV[1]
diff_allocations from, to
If I summed the live bytes from the allocation logs, I observed the size grow over time. However, if I just inspected the process with ActivityMonitor
I could not observe the same thing (which is why I'm not sure who to believe). I took about 4 samples of the process and saw it grow about 5mb each time (according to the allocation history logs).
The stack that increased the most on each iteration is this:
irb(main):008:0> puts x.grep_v(/^(vm_|invoke)/).each_with_index.map { |s,i| (" " * i) + s }.join("\n")
thread_7fffaffa03c0
start
main
ruby_run_node
ruby_exec_node
ruby_exec_internal
rb_iseq_eval_main
call_cfunc_m1
enumerator_with_index
enumerator_block_call
rb_block_call
rb_iterate
rb_iterate0
iterate_method
rb_call
rb_call0
call_cfunc_0
rb_ary_each
rb_yield
rb_yield_1
rb_yield_0
enumerator_with_index_i
rb_yield_values
rb_yield_0
call_cfunc_0
rb_ary_each
rb_yield
rb_yield_1
rb_yield_0
call_cfunc_4
rb_git_diff_tree_to_tree
rb_thread_call_without_gvl
call_without_gvl
rb_git_diff_tree_to_tree_nogvl
git_diff_tree_to_tree
git_diff__from_iterators
iterator_advance
tree_iterator_advance
tree_iterator_frame_push
git_object_lookup_prefix
git_odb_read
odb_read_1
pack_backend__read
git_packfile_unpack
git_delta_apply
malloc
I suspect this malloc
to be the one that's leaking. But I can't tell how from the code, and also leaks
does not corroborate my findings. AFAICT, the caller is responsible for freeing that memory, but it looks like there is some interesting logic around how (or if) that happens.
I tried running this again, but recorded the output of heap
along with the output from malloc_history
. heap
seems to agree that the process size is growing as well.
I'm using Rugged in an application that uses Rugged to generate thousands of diffs in quick succession, and I'm noticing memory growing steadily as it runs.
I've managed to reproduce the problem in this reduced test case, which just generates a lot of diffs with a big repo (rails/rails). You can run this script and watch the memory consumption just go up and up and up.
I don't have the C knowledge to figure out the cause of the leak, so this could be a libgit2 issue rather than a Rugged-specific one.