mstange / samply

Command-line sampling profiler for macOS and Linux
Apache License 2.0
2.04k stars 49 forks source link

Support marker file paths with thread ID #143

Closed vvuk closed 3 months ago

vvuk commented 3 months ago

Adds support for marker files of the format marker-pid-tid.txt so that markers can be assigned the correct thread, if provided. Non-tid files still work fine and go to the main thread as before.

mstange commented 3 months ago

On Linux we use the thread that mmaps the file. On macOS I was planning to just get the tid in the hooked open / fopen function and put the markers there. Would that work for your use case?

vvuk commented 3 months ago

On Linux we use the thread that mmaps the file. On macOS I was planning to just get the tid in the hooked open / fopen function and put the markers there. Would that work for your use case?

Hmm, yeah, that would work. But threads would each still need to open a different filename anyway, and this approach seems more generic (i.e. it works on any platform, even if you have no way of obtaining the thread id of the open call). No strong opinion though, on Windows I expect to emit markers via ETW anyway, though I may implement marker file support anyway because it's the easiest cross-platform thing for someone to integrate into their own app.

Also wow is the marker format verbose. We spit out gigabytes of data (we have a lot of markers, many with very long names!) in 10 seconds. I was thinking of either extending it or defining a new format that lets marker names/info be defined and then referenced by ID (plus support for categories and other things). Did you have any plans in that direction?

mstange commented 3 months ago

Oh absolutely, the marker file format is the dumbest and simplest format that was solving my problem at the time. I want to remove support for it as soon as we have something better.

The super handwavy plan for markers / traces is as follows:

mstange commented 3 months ago

But threads would each still need to open a different filename anyway

That's true. Until now I've only had a single thread per process that wanted markers.

though I may implement marker file support anyway because it's the easiest cross-platform thing for someone to integrate into their own app.

Also true. But on Windows I'm not sure how to get the path to the marker file. I was passing a --marker-file argument manually to etw-gecko for way too long until we finally switched to using ETW trace events.

Anyway, I'm happy with the "put-thread-id-in-filename" approach. We can still do what I suggested later for cases where the tid isn't part of the filename - your approach is nicely composable with what I suggested. So I'm happy to merge this PR once the failures are resolved.

vvuk commented 3 months ago

Also true. But on Windows I'm not sure how to get the path to the marker file

My super dumb approach to this was going to be to just look for them in $TEMP during processing of the etl file. In theory pid reuse could cause problems, but the chances of pid+tid pair reuse seem really low.

vvuk commented 3 months ago

... also, is there a Discord/Matrix/something channel for samply? :)

mstange commented 3 months ago

... also, is there a Discord/Matrix/something channel for samply? :)

https://chat.mozilla.org/#/room/#profiler:mozilla.org is currently the closest equivalent to a samply channel.