lsalamon / slimgen

Automatically exported from code.google.com/p/slimgen
MIT License
0 stars 0 forks source link

How does slimgen avoid breaking the GC? #1

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
It is my understanding that the GC needs to be able to walk the stack of
each thread to track objects.

Calling arbitrary asm routines seems like it would break this.

How does SlimGen avoid screwing up the GC?

Thanks,
Brien

Original issue reported on code.google.com by brien...@gmail.com on 7 Dec 2009 at 6:30

GoogleCodeExporter commented 8 years ago
The GC only runs when you are allocating memory. That is, when you call new 
within a 
method it invokes the managed allocator which may, optionally, halt the process 
for a 
GC.

If you are replacing a method that performs allocation on the managed heap then 
it 
behooves you to ensure that you follow the same assembly generation patterns 
around the 
allocation (i.e. proper reference management) that the CLR JIT generates.

Original comment by ryoohki@gmail.com on 17 Dec 2009 at 12:54

GoogleCodeExporter commented 8 years ago
:D

Original comment by ryoohki@gmail.com on 17 Dec 2009 at 12:55

GoogleCodeExporter commented 8 years ago
So are you assuming a single threaded program?

I'm have lots of threads, any of which may trigger a GC.

I am not replacing a method that allocates memory.

One of the things I'd like to do is a straight call to 
QueryPerformanceCounter() in
order to eliminate the interop overhead because I need to call this often.

Is it possible/safe to do this via slimgen?

Thanks.

Original comment by brien...@gmail.com on 21 Dec 2009 at 6:22

GoogleCodeExporter commented 8 years ago
I would suggest reading up on the GC and how it operates. It has pretty clear 
mechanics as to how it behaves with regards to threads and collections. This is 
not 
something that will affect SlimGen in any major way, with the exception of 
dealing 
with the actual allocation of memory.

Regarding QueryPerformanceCounter:
You can call it using SlimGen, it would take a bit work of work to setup, 
because we 
do not have import/export tables currently parsed, however if you obtained the 
function pointer you could then use that variable to call QPC via SlimGen.

A few things to note though:
1. You should be using Stopwatch.GetTimestamp() for this, not QPC. GetTimestamp 
calls 
QPC internally.
2. Unless you're calling QPC several thousand times or more per second you'll 
probably not notice much of a performance difference. Furthermore, calling QPC 
that 
many times per second is highly inadvisable as it was not designed for that 
kind of 
usage.

Original comment by ryoohki@gmail.com on 22 Dec 2009 at 4:31

GoogleCodeExporter commented 8 years ago
Hi,

I've read everything I could find, specifically lots of good details on Maoni's 
blog.

http://blogs.msdn.com/maoni/

I've debugged into the standard thunk and I see that Pinvoke returns may be
intercepted and the thread may be blocked by the GC.  This is confirmed by the 
blog
posting here:

" the native code is not from the EE, which means it’s running some native 
code via
Interop, we don’t need to suspend this thread because it shouldn’t be doing 
anything
with managed objects that can are not pinned. When this thread needs to return 
to
managed code, however, it will need to check if a GC is in progress and if so it
needs to wait for GC to finish."

http://blogs.msdn.com/maoni/archive/2006/06/07/suspending-and-resuming-threads-f
or-gc.aspx

I am indeed calling QPC thousands of times a second-- and I need to.  I'm 
capturing
detailed trace histories in an event processing system for analysis.  I don't 
see why
this is inadvisable but I'm interested to hear why you think it is.

Anyway, the issue I see is that slimgen may be calling out to QPC while another
thread triggers a GC.

The GC needs to walk the stacks of ALL managed threads so that it can track 
objects
that are stored in local variables.  It uses compilation meta-data to figure 
out the
offsets on the stack that contain these addresses.  If the GC is unaware that I 
am
executing native code, I don't believe it will be able to reliably do it's 
inspection.  

So my theory is that if you do anything to the stack that the GC is unaware of 
(eg
push a bunch of registers, allocate some temporaray space for an array, etc).  
You
are potentially breaking the GCs ability to track certain objects that may only 
be
referenced via local variables in the active call sequence.

Original comment by brien...@gmail.com on 22 Dec 2009 at 5:06

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
The main problem here is that you're trying to avoid the managed/unmanaged 
boundary 
in a case where you might have a garbage collection happening. This is one of 
those 
areas where SlimGen "may" or "may not" work.

The primary problem is that the managed thunk that wraps an unmanaged 
invocation does 
a lot of work which you would have to replicate ANYWAYS for you to safely call 
to 
unmanaged code with SlimGen (in the manner in which you're using it).

Part of that process is marking the thread as running in a managed or unmanaged 
state, which is done through a series of calls into the CLR (which are non 
public).

The reason I suggest against QPC (not that there are many alternatives), for 
such 
samplings is that QPC can be noisy. It uses a variety of timing mechanisms 
(including 
CPU cycle counters, PCI bus timers, etc) and not all of them are steady and 
stable 
over extremely small spans of time. Over longer spans (such as those used in 
games), 
that noise will tend to average out.

Original comment by ryoohki@gmail.com on 22 Dec 2009 at 5:59

GoogleCodeExporter commented 8 years ago
It seems like nearly every process is multithreaded.  Even if you don't 
explicitly
create threads, you still have to worry about libraries, finalizer threads, 
thread
pool thread threads for async apis, etc.

So it seems like any usage of slimgen would have to be very careful not to mess 
with
the stack or else risk a very subtle bug caused by the EE not being able to find
roots on the stack.  Eg, pushing variables or allocating stack space for 
locals.  Is
this correct?

Original comment by brien...@gmail.com on 28 Dec 2009 at 3:08

GoogleCodeExporter commented 8 years ago
Messing with the stack is fine. Heap allocations are where the issues arise, 
and only 
when a compaction is required.

Original comment by ryoohki@gmail.com on 28 Dec 2009 at 9:10

GoogleCodeExporter commented 8 years ago
Stack's are per-thread, and relocation of them is infrequent (if ever). Heap 
allocated 
objects are the only ones that actually pose a problem as far as thread related 
issues  
and the GC go. Even there, your best bet is to see how the JITted code deals 
with it 
(which is to say, it doesn't do much really).

Basically: WinDbg it and see what it produces, then you'll know what you have 
to do.

Original comment by ryoohki@gmail.com on 28 Dec 2009 at 9:13

GoogleCodeExporter commented 8 years ago
Here's the issue.  There may be a heap allocated object that is solely 
referenced via
a local (stack) variable.  In order for the GC to not prematurely collect this
object, it needs to be aware of the reference.  If you mess with the stack, I'm 
not
sure it will be able to track it.

Original comment by brien...@gmail.com on 29 Dec 2009 at 7:00