jlfwong / speedscope

🔬 A fast, interactive web-based viewer for performance profiles.
https://www.speedscope.app
MIT License
5.47k stars 238 forks source link

Speedscope doesn't match `chrome://tracing`: events never end #345

Open gjoseph92 opened 3 years ago

gjoseph92 commented 3 years ago

This file in Chrome trace format looks very different in speedscope versus chrome://tracing:

Screen Shot 2021-05-12 at 5 20 26 PM

Screen Shot 2021-05-12 at 5 20 59 PM

In speedscope, you can see from the minimap that the events don't seem to end properly—every event stretches all the way to the end of the whole trace.

I'm not that familiar with the file format, so I'm not sure if it's a malformed trace that Chrome just happens to be okay with, or if speedscope is actually doing something wrong here.

I also see lots of errors like Tried to end "transition_memory_released [...]" when "time.time" was on the top of the stack. Doing nothing instead. in the console.

FWIW, these traces were generated by viztracer. viztracer actually has an issue where the files it generates break speedscope entirely (😿 Something went wrong), and the same sorts of "tried to end x when y was on top of the stack" errors also show up. I noticed the file had lots of events with negative durations ("dur": -0.02), so I just wrote a script to drop all of those. That produced the file I linked above, which at least loads into speedscope (albeit with errors in the console), but looks very wrong. Note that chrome://tracing is able to load the file with negative durations just fine, and it looks indistinguishable (as far as I can tell) from my filtered one. Here is the original file with negative durations as well: profile-original.json.

jlfwong commented 3 years ago

Hi @gjoseph92!

I'm not that familiar with the file format, so I'm not sure if it's a malformed trace that Chrome just happens to be okay with, or if speedscope is actually doing something wrong here.

It's probably a malformed trace that chrome is just eating anyway, but I won't know without digging into the profile specifically.

A thing that's really tricky about the chrome tracing format is that it both requires supporting begin/end events at the same timestamp and requires that events can be provided out-of-order, which leads to all sorts of ambiguous behavior.

If you did want to dig in yourself, the relevant code is here: https://github.com/jlfwong/speedscope/blob/main/src/import/trace-event.ts

Thanks for the thorough report -- when I have some free time, I'll try to dig in. No promises on timeline though

gjoseph92 commented 3 years ago

Thanks for the quick reply @jlfwong! Since this file is all X events, I was hoping it wouldn't run into as many edge cases. But I'll take a look at that import code if I get the chance, thanks for the link.

jlfwong commented 3 years ago

Interesting! If it's all "X" events that is more surprising. It's still possible to have invalid configurations of "X" events (including negative durations like you mentioned) because you can have invalid nesting. For example having one that starts at t=1 w/ dur=10, and another that has t=2 dur=10. In that case, the resulting event trace isn't a valid flamegraph.

Though it's also totally possible there's an unambiguously correct interpretation and this is just a bug.

If you did want to help accelerate investigating, the best thing is to remove as many events as you can while still showing obviously different results in chrome://tracing and speedscope

gjoseph92 commented 3 years ago

Here's a very reduced version that seems to still show the issue:

As before, the no-negative-dur one has events with negative durations dropped.

Chrome: Screen Shot 2021-05-12 at 6 51 34 PM

Speedscope without negative durations (notice how there's only 1 __lt__ at level 2 instead of 2, and builtins.isinstance runs the whole duration): Screen Shot 2021-05-12 at 6 51 35 PM

Speedscope with negative durations (looks "right-heavy" style like in the original issue): Screen Shot 2021-05-12 at 7 00 15 PM

jlfwong commented 3 years ago

Perfect! Thanks for during the reduction work

gaogaotiantian commented 3 years ago

hi @gjoseph92 , is the trace generated on Windows or Linux? This is a bug for VizTracer. VizTracer created artificial time gap when the timestamp on the system does not have enough granularity, but it should not create negative duration blocks. Could you provide more info(and probably create an issue in VizTracer) about this? Thanks.

gjoseph92 commented 3 years ago

Hey @gaogaotiantian! I was going to wait to tag you until we'd confirmed it was actually an issue with the files and not speedscope, but I'm glad you noticed this and thanks for the info. These traces are from macOS. I'm happy to open an issue about the negative durations.

gaogaotiantian commented 3 years ago

@gjoseph92 chrome trace viewer may did some work behind the scene to make the visualization more reasonable(I know perfetto is much more strict), but negative duration is definitely a bug.

marcizhu commented 2 years ago

I'm also having this issue both when my traces are events ("ph":"X") and when they are composed of begin and end events independently. I see tons of warnings on the JavaScript console, mainly the one described in the first comment and Frame "<name>" was still open at the end of profile. Closing automatically..

EDIT: Doing a closer inspection of the log messages I saw many stating Tried to end "(unnamed)" but "<function name>" was on top of the stack. Doing nothing instead. I think this issue is because the end events do not contain the function name which, according to the trace event format on Duration events only fields pid, tid, ph and ts are required. An end event should pop the event currently on top of the stack, regardless of its name.

EDIT 2: I changed my trace format to output the function name on end events and this seems to have fixed the issue.

BafS commented 9 months ago

I the same issue. Every event are ph=X and I can open them in google debug tool or brave://tracing/ or https://ui.perfetto.dev/. I checked and I don't have negative time.