jlfwong / speedscope

🔬 A fast, interactive web-based viewer for performance profiles.
https://www.speedscope.app
MIT License
5.58k stars 247 forks source link

Use the raw_lines field in stackprof for more precise lines #483

Open dalehamel opened 1 month ago

dalehamel commented 1 month ago

Since https://github.com/tmm1/stackprof/pull/213 the actual source line within the method, not just the method start line, should be available.

From the existing simple.rb:

https://github.com/jlfwong/speedscope/blob/d69f3d0eec29d1d1aeb725b97de3a5209f85447a/sample/programs/ruby/simple.rb#L32-L34

On main, it just gives the method start line:

Screenshot 2024-10-10 at 10 34 19 PM

With this patch, we get the actual line number of the call:

Screenshot 2024-10-10 at 10 33 00 PM

I've added some rhudimentary tests, to cover both older stackprof files that are missing raw_lines, and newer ones that have it. I basically just walked "children[0]" manually until i found a node with a line. Since the file is checked in, i think this is fine.

EDIT: addressing frame collisions

As called out in https://github.com/jlfwong/speedscope/pull/483#issuecomment-2406515797, my first commit could lead to collisions due to frames that were not identical using the same key. This is addressed in 136dd990ba0447608ec366fd7db1f7dd14a6de5c

Looking at the following code:

https://github.com/jlfwong/speedscope/blob/d69f3d0eec29d1d1aeb725b97de3a5209f85447a/sample/programs/ruby/simple.rb#L3-L10

On main, we only see the frame info showing a calling from line 4:

Screenshot 2024-10-11 at 9 53 19 AM

But now, we have two different frames for a, one with line 5, and one with line 8:

Screenshot 2024-10-11 at 9 52 27 AM

In a more pathological example, I modified simple.rb to insert a sleep on line 16 to show how previously we would in fact lose information. Here is a graph on main:

Screenshot 2024-10-11 at 10 02 23 AM

And the other tree for the code path where it doesn't sleep:

Screenshot 2024-10-11 at 10 03 25 AM

Here it is with the patch:

Screenshot 2024-10-11 at 10 00 35 AM

And the other tree for the code path where it doesn't sleep:

Screenshot 2024-10-11 at 10 03 42 AM

We can see that with the patch, it correctly shows the call site of "sleep" and is able to distinguish these code paths, one for line 16 where sleep is actually called, and one for line 13 for the other code path and indeed another on line 14 for the other instance of b. Previously these would all have had the same value. The graph has more nodes, and we are able to have different instances of the "b" frame.

jlfwong commented 1 month ago

Hi @dalehamel! Thanks for the patch — happy to merge this after tests are fixed

It looks like the new test file wasn't checked in:

Summary of all failing tests
FAIL src/import/stackprof.test.ts
  ● reads correct line numbers › uses the most precise iseq line if raw_lines is available

    ENOENT: no such file or directory, open './sample/profiles/stackprof/wall_with_raw_lines.json'

      at Object.openSync (node:fs:596:3)
      at readFileSync (node:fs:464:35)
      at src/import/stackprof.test.ts:64:37
      at src/import/stackprof.test.ts:8:71
      at Object.<anonymous>.__awaiter (src/import/stackprof.test.ts:4:12)
      at Object.<anonymous> (src/import/stackprof.test.ts:63:80)
jlfwong commented 1 month ago

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:

a;b
a;c
def a
  b
  c
end

If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

dalehamel commented 1 month ago

It looks like the new test file wasn't checked in:

Whoops, i amended the commit but forgot to push.

dalehamel commented 1 month ago

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:

a;b
a;c
def a
  b
  c
end

If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

I'll need to look into this further and get back to you

dalehamel commented 1 month ago

One possible solution would be to include the line number as part of the key. Looks like it accepts a string, so a quick and dirty way to prevent the collisions would be to append the line number to the frame id, eg:

diff --git a/src/import/stackprof.ts b/src/import/stackprof.ts
index bd20c5d..e4cd7f5 100644
--- a/src/import/stackprof.ts
+++ b/src/import/stackprof.ts
@@ -39,8 +39,9 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile
       if (frameName == null) {
         frameName = '(unknown)'
       }
+      const key = id.toString(10) + "_" + lineNo.toString(10);
       const frame = {
-        key: id,
+        key: key,
         ...frames[id],
         line: lineNo,
         name: frameName,

This addresses some weirdness I was seeing in one of my profiles which I think was what you called out. This is one solution to ensure that they are keyed differently. I'll get into a bit more detail about this in a bit.

dalehamel commented 1 month ago

@jlfwong please take another look - i updated the description to address the collision issue you pointed out. I believe I have come up with a keying strategy that should resolve this.

Unfortunately it changes the key from a number to a string, as a general rule I prefer integers for performance reasons but in this case due to number only permitting 53 bits, there is no way to stuff all the info in without the possibility of truncation. So use a similar trick as in https://github.com/tmm1/stackprof/pull/213 to stuff the frame and line number together, but have to convert the result to a string.

If we were to update https://github.com/jlfwong/speedscope/blob/d69f3d0eec29d1d1aeb725b97de3a5209f85447a/src/lib/utils.ts#L46-L49 to permit BigInt in addition to string and number, we can avoid this conversion but I'm not sure if there might be unintended consequences to this. For now, since the keyed set accepts strings, I just use convert the resulting BigInt to its base 10 string representation.

dalehamel commented 1 month ago

@jlfwong - gentle nudge as it has been a week, do you have a preference on the key generation approaches? FWIW we've been using the string based approach in prod and it seems to be fine performance-wise.

dalehamel commented 4 weeks ago

@jlfwong last ping I'll give, when you are able to take a look let me know but I'm moving on to other work and running a forked version for now.

jlfwong commented 3 weeks ago

Hey, sorry I've been MIA

Only have a couple minutes right now, but I think I may not have communicated my concern here correctly.

Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:

a;b a;c def a b c end If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity

When I was describing the problem here, my concern is that, IMO, we want these two "a" frames to be collapsed. If I understand this change correctly (which I might not be), then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases.

This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time.

Am I misunderstanding the nature of this change?

dalehamel commented 3 weeks ago

IMO, we want these two "a" frames to be collapsed

FWIW I rolled this out internally a couple of weeks ago, and the response has been positive - developers that I work with seem to prefer to be able to follow the execution more precisely by having the actual call sites in the graphical views.

then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases.

Yes

This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time.

Yes, this does happen. They are displayed with the same name, but then on clicking on them, you can see there are different line numbers. I agree this messes up the "top table" part of the sandwich view, as a frame that has multiple call sites can now be showed below a frame that only has one, but actually has a lower total overall weight. Perhaps the top table code could be amended to just use the frame names as the "key" when performing the aggregation in the table, but show the distinct call sites in the actual sandwich graphical view.

In the "pprof" format, it is possible to store both the method definition line as well as the actual call line. Perhaps a compromise would be to amend the data structures to be able to store both, and have an option to toggle exact line numbers, or line numbers of the method definition only.

jlfwong commented 10 hours ago

developers that I work with seem to prefer to be able to follow the execution more precisely by having the actual call sites in the graphical views.

Ah I see. I hadn't understood that this was an intentional behavioral change, not just an update to which number is displayed in the interface (i.e. this actually produces structurally different flamegraphs, not just flamegraphs with different line number labels).

I'm afraid that I'm uncomfortable merging this PR as-is then. The rest of speedscope assumes that the line number is the line number of the function definition, not the line number of the invocation. It's interesting that the developers you've been working with have appreciated the change, but I would personally be quite bothered by this (especially if it was inconsistent depending on what file format was imported).

The thing I could easily get on board with is for speedscope profiles to contain both the function definition line and the call site line, as you suggested. But that's a different change than this.

In that world, the function definition line lives, as it does today, in Frame. The call site line, which would be newly introduced, would live in CallTreeNode. Frames correspond to functions, and are reused at many places in flamegraphs. CallTreeNode instances on the other hand correspond 1:1 with flame graph nodes.

Depending on the circumstance, we would use one v.s. the other. The only view that seems to me like it should unambiguously use the call site line number is the back trace view that's shown when you click on a rectangle in the time ordered view

image