sociomantic-tsunami / tangort

Tango D1 runtime library
Other
0 stars 9 forks source link

[do not merge] PoC of stat-based -profile=gc #12

Closed mihails-strasuns-sociomantic closed 5 years ago

mihails-strasuns-sociomantic commented 7 years ago

@leandro-lucarella-sociomantic I tried to adjust my upstream implementation for CDGC internals but seem to get broken sizes - maybe you can have a quick sanity check?

mihails-strasuns-sociomantic commented 7 years ago

Mmmm, I think the totalCollectedPages could be wrong.

What I observe is that all collected sizes are 100% insane showing some crazy large amount of bytes of low variety. What would be the best way to calculate total? It is only intended to be used for this : https://github.com/sociomantic-tsunami/tangort/pull/12/files#diff-5a94f38f031f9e1a9fa4d95c1eeb2351R41

leandro-lucarella-sociomantic commented 7 years ago

Mmm, I don't understand why are you involving the collected pages at all, why not just returning usedSize2 - usedSize1?

mihails-strasuns-sociomantic commented 7 years ago

Because any allocation may result in collection and usedSize2 will be less than usedSize1

leandro-lucarella-sociomantic commented 7 years ago

Yeah, good point. Then I guess you should probably have to track the space recovered by the sweep in some other way.

Just to understand this correctly, what you want from this is to get the real new memory allocated by the GC, right? So for example, if a string with lenght = 20 and ptr pointing to a block of 32 bytes and you do a realloc(ptr, 30), you want that to return 0 because nothing got really allocated despite the fact that you "asked" for 30 bytes, right? (and the same if you did realloc(ptr, 10).

If so, I'm not sure if the profiler already does, but I think it would be good to track too the size requested and the real size allocated by that pointer in total. In this case it will be 30 and 32.

I guess you don't want to track freed memory by free() (or realloc(ptr, 0) or a realloc() that frees up some space like it can happen for big blocks (> PAGESIZE))? Also, that you don't want to count a new pool being reserved if eager_allocation is enabled as an allocation, right?

But back to your problem, and assuming you want to get what I said up there, I don't see a way to get it that is not going to the guts of the GC and track the sizes in malloc and realloc.

Here is a sample implementation, which could be useful to track allocations by user code too:

diff --git a/src/gc/cdgc/gc.d b/src/gc/cdgc/gc.d
index 5de50d7..7feffad 100644
--- a/src/gc/cdgc/gc.d
+++ b/src/gc/cdgc/gc.d
@@ -244,6 +244,8 @@ struct GC
     size_t total_mem;
     /// Free heap memory
     size_t free_mem;
+    /// Tracking of last allocations
+    size_t last_alloc;

     /// Free list for each size
     List*[B_MAX] free_list;
@@ -1568,6 +1570,8 @@ private void *malloc(size_t size, uint attrs, size_t* pm_bitmask)
     }

     gc.free_mem -= capacity;
+    gc.last_alloc += capacity;
+
     if (collected) {
         // If there is not enough free memory (after a collection), allocate
         // a new pool big enough to have at least the min_free% of the total
@@ -1696,6 +1700,7 @@ private void *realloc(void *p, size_t size, uint attrs,
                             newsz - psz);
                     auto new_blk_size = (PAGESIZE * newsz);
                     gc.free_mem -= new_blk_size - blk_size;
+                    gc.last_alloc += new_blk_size - blk_size;
                     // update the size cache, assuming that is very
                     // likely the size of this block will be queried in
                     // the near future
@@ -1809,6 +1814,7 @@ body
     gc.p_cache = null;
     gc.size_cache = 0;
     gc.free_mem -= new_size - blk_size;
+    gc.last_alloc += new_size - blk_size;
     // update the size cache, assuming that is very likely the size of this
     // block will be queried in the near future
     pool.update_cache(p, new_size);
@@ -2778,10 +2784,18 @@ void gc_monitor(begin_del begin, end_del end)
     })();
 }

-void gc_usage(size_t* used, size_t* free)
+// This should be locked! Unless you are counting on ints to be atomic.
+void gc_usage(size_t* used, size_t* free, size_t* last)
 {
     *free = gc.free_mem;
     *used = gc.total_mem - gc.free_mem;
+    *last = gc.last_alloc;
+}
+
+// This should be locked! Unless you are counting on ints to be atomic.
+void gc_resetLastAllocation()
+{
+    gc.last_alloc = 0;
 }

 // vim: set et sw=4 sts=4 :

(untested)

Then your count() is reduced to:

gc_resetLastAllocation();
wrapped(...);
auto stats = GC.stats();
if (stats.last > 0)
    accumulate(file, line, funcname, name, stats.last);

As said, you could use this scheme to see even if a function from a library that you don't control do any allocations... (and how much is allocated) which seems to be pretty cool.

leandro-lucarella-sociomantic commented 7 years ago

A similar scheme could be used to track freed memory... (that is not collected by the GC), but I'm not sure how useful that would be.

mihails-strasuns-sociomantic commented 7 years ago

Just to understand this correctly, what you want from this is to get the real new memory allocated by the GC, right?

Yep. Essentially we want to know most how specific lines of code have contributed to overall GC memory throughput.

If so, I'm not sure if the profiler already does, but I think it would be good to track too the size requested and the real size allocated by that pointer in total. In this case it will be 30 and 32.

This (tracking requested size) is what old upstream code did, though it always considered length = 10 as request for new 10 items. It is trivial to support both (just lot of boilerplate) but we don't need it right now - probably something to keep in mind if we ever get to upstream-mergeable implementation.

But back to your problem, and assuming you want to get what I said up there, I don't see a way to get it that is not going to the guts of the GC and track the sizes in malloc and realloc

:( With upstream GC very naive counting of pages (https://github.com/dlang/druntime/pull/1806/files#diff-6615611ba51118215a271e3500c61122R2465) gave me exactly results I have expected in synthetic tests so I hoped it wouldn't be too different in CDGC.

Will give your patch a try and see how does it compare against current dmd-transitional version!

leandro-lucarella-sociomantic commented 7 years ago

I am suprised that counting collected pages works at all when trying to find out how much memory something allocated. if there was a collection in the middle. If you are always getting weird values then I guess using the collected pages is not the culprit, I don't know what that could be, but the changes I'm proposing, even when maybe a bit hacky, should be very precise and it would allow arbitrary user code to monitor how much memory a piece of code allocates.

mihails-strasuns-sociomantic commented 7 years ago

I have adjusted your proposed patch to dmd-transitional too and it has shown that actually collected page based solution was not truly working - it was only showing stats truthful enough to not suspect they are wrong ;)

I have rebuilt test docker image with both dmd1 and dmd-transitional using lastAlloc based stats based on your suggested patch and those seem to work like a charm now. Should be good enough for our internal debugging.

leandro-lucarella-sociomantic commented 7 years ago

those seem to work like a charm now

Great!

leandro-lucarella-sociomantic commented 7 years ago

Is this good to go?

mihails-strasuns-sociomantic commented 7 years ago

Do we want to merge it in D1 runtime before it gets accepted to the upstream one?

leandro-lucarella-sociomantic commented 7 years ago

Since this is just an implementation issue (not an interface one), I don't see why not. We could wait a bit and try to press upstream to merge the PR there if you prefer.

mihails-strasuns commented 5 years ago

Closing in favor of https://github.com/sociomantic-tsunami/tangort/pull/39