ivoanjo / gvl-tracing

Get a timeline view of Global VM Lock usage in your Ruby app
MIT License
114 stars 7 forks source link

Provide thread name (and when possible gem name) on output #12

Closed benoittgt closed 1 year ago

benoittgt commented 1 year ago

gvl-tracing handles properly handles thread name for basic usage like:

  Thread.new do
    Thread.current.name = "thread name"
  end

But on more complicated examples with external libraries, it is not able to properly handles thread name. Jean had a good idea in https://github.com/ivoanjo/gvl-tracing/issues/9 with using Thread.list.

This commit does this:

This is a very naive approach and probably not super efficient. But this is a first try.

Screenshot with the new examples file:

Screenshot 2023-04-05 at 21 23 45

Screenshot of a gvl-tracing on a Rails app with some multihread:

Screenshot 2023-04-05 at 21 45 01

Opening as a draft because code need probably some rework.

ivoanjo commented 1 year ago

Hey there!

Big thanks for working on this 🎉🎉🎉

I was a bit slow to respond over Easter weekend but I'll make an effort to be more responsive to this PR from now on :) :)

I was doing a few experiments with the perfetto UI and I think there's a path to avoiding post-processing the output with the following insight: the perfetto UI doesn't mind where in the event sequence the thread_name appears.

It turns out the metadata event doesn't have to show up when we first see the thread (as I initially implemented). Furthermore, I just experimented a bit and it turns out that if we repeat the metadata events for a thread more than once, and perfetto only displays the last one.

Thus, I think we don't need to do the reparsing; instead, as part of stop, we could re-print the names of the threads to the trace file, e.g. by perhaps building up the thread names in Ruby as you're doing in the PR, and then passing it as a string that gets appended to the trace file before it gets closed. And for perfetto it seems like this is OK -- we don't even need to omit the duplicate names at the beginning.

What do you think? I'm suggesting this approach because it means that even if we run the tracing for a long time, the overhead when stopping would always be O(number of threads) which seems quite reasonable.


There is another downside with this approach of getting the thread names at the end, which I am totally ok with if we don't solve yet -- I didn't solve it in my first implementation either. I'm sharing it here more for discussion :)

The issue is because Ruby reuses native threads, it means that using only the native thread id to identify a thread and match it to a Ruby Thread object may not be enough -- two Ruby Thread objects can actually share the same native thread id.

This means that the name can be completely wrong, if we hit one of these cases :)

casperisfine commented 1 year ago

This means that the name can be completely wrong, if we hit one of these cases :)

Yeah, but it's hard to solve this. We'd need a callback on thread start to eagerly assign thread names.

Overall, code that spawn short lived threads are rare, so maybe it's an OK limitation?

ivoanjo commented 1 year ago

This means that the name can be completely wrong, if we hit one of these cases :)

Overall, code that spawn short lived threads are rare, so maybe it's an OK limitation?

Definitely it's OK to not solve it in this PR! I brought it up since it would be nice to fix at some point -- as it's a bit of a sharp edge that can confuse users -- and maybe the discussion here would shine a light on what we could do for that one :)

Yeah, but it's hard to solve this. We'd need a callback on thread start to eagerly assign thread names.

I've been wondering if we could approximate it during RUBY_INTERNAL_THREAD_EVENT_READY/RUBY_INTERNAL_THREAD_EVENT_RESUMED/RUBY_INTERNAL_THREAD_EVENT_SUSPENDED by checking if rb_thread_current() is still the last one we saw for this thread. Something like this?

casperisfine commented 1 year ago

Something like this?

The problem is that you can only generate the name when you hold the GVL. So you could detect that the name need to be generated again, but couldn't do it from these callbacks (except for RESUMED).

ivoanjo commented 1 year ago

The problem is that you can only generate the name when you hold the GVL. So you could detect that the name need to be generated again, but couldn't do it from these callbacks (except for RESUMED).

Right! I guess there's two parts of the problem:

The name could still come as a best-effort thing done later (as this PR does) -- worst case, we don't get a name.

benoittgt commented 1 year ago

Thus, I think we don't need to do the reparsing; instead, as part of stop, we could re-print the names of the threads to the trace file, e.g. by perhaps building up the thread names in Ruby as you're doing in the PR, and then passing it as a string that gets appended to the trace file before it gets closed. And for perfetto it seems like this is OK -- we don't even need to omit the duplicate names at the beginning.

That's interesting. I'm gonna have a look at what is possible with metadatas and perfetto UI.

benoittgt commented 1 year ago

@ivoanjo I had a quick look of your proposal and pushed a commit. It is much better. Good catch with perfetto trace format. The only issue I have is the thread_id provided by the C extension is different from the one I get in thread list. I didn't look a it yet.

Edit

Maybe going in this direction may be a good idea? It is working properly. What was the need of using serial instead of a native thread id?

--- a/ext/gvl_tracing_native_extension/gvl_tracing.c
+++ b/ext/gvl_tracing_native_extension/gvl_tracing.c
@@ -76,10 +76,11 @@ static inline void initialize_thread_id(void) {

 static inline void render_thread_metadata(void) {
   uint64_t native_thread_id = 0;
-  #ifdef HAVE_GETTID
-    native_thread_id = gettid();
-  #elif HAVE_PTHREAD_THREADID_NP
+
+  #ifdef HAVE_PTHREAD_THREADID_NP
     pthread_threadid_np(pthread_self(), &native_thread_id);
+  #elif HAVE_GETTID
+    native_thread_id = gettid();
   #else
     native_thread_id = current_thread_serial; // TODO: Better fallback for Windows?
   #endif
@@ -91,9 +92,8 @@ static inline void render_thread_metadata(void) {
   #endif

   fprintf(output_file,
-    "  {\"ph\": \"M\", \"pid\": %u, \"tid\": %u, \"name\": \"thread_name\", \"args\": {\"name\": \"%lu %s\"}},\n",
-    process_id, current_thread_serial, native_thread_id, native_thread_name_buffer
-  );
+    "  {\"ph\": \"M\", \"pid\": %u, \"tid\": %llu, \"name\": \"thread_name\", \"args\": {\"name\": \"%lu %s\"}},\n",
+    process_id, native_thread_id, native_thread_id, native_thread_name_buffer);
 }

 static VALUE tracing_start(VALUE _self, VALUE output_path) {
@@ -170,6 +170,15 @@ static void render_event(const char *event_name) {
   }

   unsigned int thread_id = current_thread_serial;
+  uint64_t native_thread_id = 0;
+
+  #ifdef HAVE_PTHREAD_THREADID_NP
+    pthread_threadid_np(pthread_self(), &native_thread_id);
+  #elif HAVE_GETTID
+    native_thread_id = gettid();
+  #else
+    native_thread_id = current_thread_serial; // TODO: Better fallback for Windows?
+  #endif

   // Each event is converted into two events in the output: one that signals the end of the previous event
   // (whatever it was), and one that signals the start of the actual event we're processing.
@@ -180,13 +189,13 @@ static void render_event(const char *event_name) {

   fprintf(output_file,
     // Finish previous duration
-    "  {\"ph\": \"E\", \"pid\": %u, \"tid\": %u, \"ts\": %f},\n" \
+    "  {\"ph\": \"E\", \"pid\": %llu, \"tid\": %u, \"ts\": %f},\n" \
     // Current event
-    "  {\"ph\": \"B\", \"pid\": %u, \"tid\": %u, \"ts\": %f, \"name\": \"%s\"},\n",
+    "  {\"ph\": \"B\", \"pid\": %llu, \"tid\": %u, \"ts\": %f, \"name\": \"%s\"},\n",
     // Args for first line
-    process_id, thread_id, now_microseconds,
+    process_id, native_thread_id , now_microseconds,
     // Args for second line
-    process_id, thread_id, now_microseconds, event_name
+    process_id, native_thread_id , now_microseconds, event_name
   );
 }
benoittgt commented 1 year ago

So yes, I think at this point it's perfectly reasonable to switch back to using the thread_id as the tid, instead of the sequential value.

I am wondering if my change properly address your suggestion.

I tested the last code change on macOS with success with example4.rb (we fallback on thread name via native_thread_name_buffer), thread_name_with_extension.rb and Rails app code.

Sorry for the late reply. I was on vacations. 😊

benoittgt commented 1 year ago

Thanks @ivoanjo for your review. Here is few screenshots of the new output. I have similar output on a debian 11 too.

I am not sure for the C changes. You will tell me. :)

Screenshot 2023-04-27 at 17 48 29 Screenshot 2023-04-27 at 17 48 24
ivoanjo commented 1 year ago

This looks great! I hope you don't take my long delay the wrong way -- I definitely I am very grateful for the contribution. It's just been an intense few weeks :)

And feel free to send more things this way :)

ivoanjo commented 1 year ago

@benoittgt I've just released this improvement as version 1.2.0 on rubygems :tada: