Open dalehamel opened 1 month ago
Hi @dalehamel! Thanks for the patch — happy to merge this after tests are fixed
It looks like the new test file wasn't checked in:
Summary of all failing tests
FAIL src/import/stackprof.test.ts
● reads correct line numbers › uses the most precise iseq line if raw_lines is available
ENOENT: no such file or directory, open './sample/profiles/stackprof/wall_with_raw_lines.json'
at Object.openSync (node:fs:596:3)
at readFileSync (node:fs:464:35)
at src/import/stackprof.test.ts:64:37
at src/import/stackprof.test.ts:8:71
at Object.<anonymous>.__awaiter (src/import/stackprof.test.ts:4:12)
at Object.<anonymous> (src/import/stackprof.test.ts:63:80)
Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:
a;b
a;c
def a
b
c
end
If a()
calls b()
on line 7, and a()
calls c()
on line 8, what should be displayed here? Speedscope considers frames with the same key
to be identical for display, and collapses them into a single entity
It looks like the new test file wasn't checked in:
Whoops, i amended the commit but forgot to push.
Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:
a;b a;c
def a b c end
If
a()
callsb()
on line 7, anda()
callsc()
on line 8, what should be displayed here? Speedscope considers frames with the samekey
to be identical for display, and collapses them into a single entity
I'll need to look into this further and get back to you
One possible solution would be to include the line number as part of the key. Looks like it accepts a string, so a quick and dirty way to prevent the collisions would be to append the line number to the frame id, eg:
diff --git a/src/import/stackprof.ts b/src/import/stackprof.ts
index bd20c5d..e4cd7f5 100644
--- a/src/import/stackprof.ts
+++ b/src/import/stackprof.ts
@@ -39,8 +39,9 @@ export function importFromStackprof(stackprofProfile: StackprofProfile): Profile
if (frameName == null) {
frameName = '(unknown)'
}
+ const key = id.toString(10) + "_" + lineNo.toString(10);
const frame = {
- key: id,
+ key: key,
...frames[id],
line: lineNo,
name: frameName,
This addresses some weirdness I was seeing in one of my profiles which I think was what you called out. This is one solution to ensure that they are keyed differently. I'll get into a bit more detail about this in a bit.
@jlfwong please take another look - i updated the description to address the collision issue you pointed out. I believe I have come up with a keying strategy that should resolve this.
Unfortunately it changes the key from a number to a string, as a general rule I prefer integers for performance reasons but in this case due to number
only permitting 53 bits, there is no way to stuff all the info in without the possibility of truncation. So use a similar trick as in https://github.com/tmm1/stackprof/pull/213 to stuff the frame and line number together, but have to convert the result to a string.
If we were to update https://github.com/jlfwong/speedscope/blob/d69f3d0eec29d1d1aeb725b97de3a5209f85447a/src/lib/utils.ts#L46-L49 to permit BigInt in addition to string and number, we can avoid this conversion but I'm not sure if there might be unintended consequences to this. For now, since the keyed set accepts strings, I just use convert the resulting BigInt to its base 10 string representation.
@jlfwong - gentle nudge as it has been a week, do you have a preference on the key generation approaches? FWIW we've been using the string based approach in prod and it seems to be fine performance-wise.
@jlfwong last ping I'll give, when you are able to take a look let me know but I'm moving on to other work and running a forked version for now.
Hey, sorry I've been MIA
Only have a couple minutes right now, but I think I may not have communicated my concern here correctly.
Oh hm, one consideration here: other profile imports intentionally use the line number of the method definition, not the line of the invocation. This prevent things from getting confusing when e.g. you have stacks that look like:
a;b a;c def a b c end If a() calls b() on line 7, and a() calls c() on line 8, what should be displayed here? Speedscope considers frames with the same key to be identical for display, and collapses them into a single entity
When I was describing the problem here, my concern is that, IMO, we want these two "a" frames to be collapsed. If I understand this change correctly (which I might not be), then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases.
This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time.
Am I misunderstanding the nature of this change?
IMO, we want these two "a" frames to be collapsed
FWIW I rolled this out internally a couple of weeks ago, and the response has been positive - developers that I work with seem to prefer to be able to follow the execution more precisely by having the actual call sites in the graphical views.
then it would treat those two "a" frames as distinct by giving them different keys since their line numbers will be different if we use their invocation sites of b & c respectively rather than using the definition line of a in both cases.
Yes
This has minor annoyances for visual display, but creates bigger problems for aggregate statistics. For example, if this was the whole trace, then the sandwich view would read "50% a, 50% a" instead of "100% a" for total time.
Yes, this does happen. They are displayed with the same name, but then on clicking on them, you can see there are different line numbers. I agree this messes up the "top table" part of the sandwich view, as a frame that has multiple call sites can now be showed below a frame that only has one, but actually has a lower total overall weight. Perhaps the top table code could be amended to just use the frame names as the "key" when performing the aggregation in the table, but show the distinct call sites in the actual sandwich graphical view.
In the "pprof" format, it is possible to store both the method definition line as well as the actual call line. Perhaps a compromise would be to amend the data structures to be able to store both, and have an option to toggle exact line numbers, or line numbers of the method definition only.
developers that I work with seem to prefer to be able to follow the execution more precisely by having the actual call sites in the graphical views.
Ah I see. I hadn't understood that this was an intentional behavioral change, not just an update to which number is displayed in the interface (i.e. this actually produces structurally different flamegraphs, not just flamegraphs with different line number labels).
I'm afraid that I'm uncomfortable merging this PR as-is then. The rest of speedscope assumes that the line number is the line number of the function definition, not the line number of the invocation. It's interesting that the developers you've been working with have appreciated the change, but I would personally be quite bothered by this (especially if it was inconsistent depending on what file format was imported).
The thing I could easily get on board with is for speedscope profiles to contain both the function definition line and the call site line, as you suggested. But that's a different change than this.
In that world, the function definition line lives, as it does today, in Frame
. The call site line, which would be newly introduced, would live in CallTreeNode
. Frames correspond to functions, and are reused at many places in flamegraphs. CallTreeNode
instances on the other hand correspond 1:1 with flame graph nodes.
Depending on the circumstance, we would use one v.s. the other. The only view that seems to me like it should unambiguously use the call site line number is the back trace view that's shown when you click on a rectangle in the time ordered view
Since https://github.com/tmm1/stackprof/pull/213 the actual source line within the method, not just the method start line, should be available.
From the existing simple.rb:
https://github.com/jlfwong/speedscope/blob/d69f3d0eec29d1d1aeb725b97de3a5209f85447a/sample/programs/ruby/simple.rb#L32-L34
On main, it just gives the method start line:
With this patch, we get the actual line number of the call:
I've added some rhudimentary tests, to cover both older stackprof files that are missing
raw_lines
, and newer ones that have it. I basically just walked "children[0]" manually until i found a node with a line. Since the file is checked in, i think this is fine.EDIT: addressing frame collisions
As called out in https://github.com/jlfwong/speedscope/pull/483#issuecomment-2406515797, my first commit could lead to collisions due to frames that were not identical using the same key. This is addressed in 136dd990ba0447608ec366fd7db1f7dd14a6de5c
Looking at the following code:
https://github.com/jlfwong/speedscope/blob/d69f3d0eec29d1d1aeb725b97de3a5209f85447a/sample/programs/ruby/simple.rb#L3-L10
On main, we only see the frame info showing
a
calling from line 4:But now, we have two different frames for
a
, one with line 5, and one with line 8:In a more pathological example, I modified
simple.rb
to insert a sleep on line 16 to show how previously we would in fact lose information. Here is a graph on main:And the other tree for the code path where it doesn't sleep:
Here it is with the patch:
And the other tree for the code path where it doesn't sleep:
We can see that with the patch, it correctly shows the call site of "sleep" and is able to distinguish these code paths, one for line 16 where sleep is actually called, and one for line 13 for the other code path and indeed another on line 14 for the other instance of b. Previously these would all have had the same value. The graph has more nodes, and we are able to have different instances of the "b" frame.