google / pprof

pprof is a tool for visualization and analysis of profiling data
Apache License 2.0
7.98k stars 607 forks source link

Option to select source files directory #262

Closed mkevac closed 6 years ago

mkevac commented 6 years ago

What version of pprof are you using?

tip from https://github.com/google/pprof

What operating system and processor architecture are you using?

Linux 64bit

I would like to have a way to tell pprof where it can find source files.

Imagine I have some service running on the server and I have a binary that is running there.

I can use pprof like this:

$ pprof -http=:8080 /tmp/lakafka-1.22.0-278 /tmp/prof

Where /tmp/lakafka-1.22.0-278 is a binary built on a build server and /tmp/prof is a profile collected via wget from http://.../debug/pprof/profile.

If I then go to View -> Source I don't see any source code. pprof is looking for the source code in the path from build server, e.g. /home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go

But I am not on a build server. I have the same source code at, lets say, /home/marko/lakafka.

I would like to tell pprof to look for the source code at /home/marko/lakafka.

Similar to what dir command does in GDB.

Thank you!

rauls5382 commented 6 years ago

You can use pprof -source_path=/home/marko/lakafka ... to specify the path for your source tree. I see that pprof -help doesn't list this. I'll send a commit to include it.

rauls5382 commented 6 years ago

Actually, it is listed in pprof -help, so no need for any changes.

aalexand commented 6 years ago

I think -source_path currently does not help to locate source files that have their path recorded as absolute path in the debug information which seems to be the case here. See https://github.com/google/pprof/blob/e82ee9addc1b6c8e1d667ed6de0194241e1e03b5/internal/report/source.go#L572 where there seems to be a short-circuit for the absolute path case.

I also find the logic of handling the search paths here a bit inconvenient: it attempts to append the file path from the debug information to each non-empty path prefix of each search path. So if one specifies -source_path=/home/aalexand/src:/usr/local/share/company/src and the file being searched is foo/bar/baz.c, the following paths will be tried:

/home/aalexand/src/foo/bar/baz.c
/home/aalexand/foo/bar/baz.c
/home/foo/bar/baz.c
/foo/bar/baz.c
/usr/local/share/company/src/foo/bar/baz.c
/usr/local/share/company/foo/bar/baz.c
/usr/local/share/foo/bar/baz.c
/usr/local/foo/bar/baz.c
/usr/foo/bar/baz.c
/foo/bar/baz.c

@rauls5382 I am curious what are the use cases for this search strategy? Generally, looking up the content in the parent directories of the search paths the user specified is not a very good thing as some of those parent directories could be network automount directories and looking up random paths in those is super-slow.

What I think I would prefer as a user instead, is trying every possible suffix of the path being searched, breadth-first. So for my example, the following paths would be tried:

/home/aalexand/src/foo/bar/baz.c
/usr/local/share/company/src/foo/bar/baz.c
/home/aalexand/src/bar/baz.c
/usr/local/share/company/src/bar/baz.c
/home/aalexand/src/baz.c
/usr/local/share/company/src/baz.c

That avoids the issue with the lookup of arbitrary paths in parent directories, Also, with that, I'd also remove the check for whether the path is absolute. This would make the reported case supported as the user could specify now -search_path=/home/marko/lakafka/gopath and the search for file /home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go would work as

/home/marko/lakafka/gopath/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
/home/marko/lakafka/gopath/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
/home/marko/lakafka/gopath/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
/home/marko/lakafka/gopath/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
/home/marko/lakafka/gopath/56cbaf90675b10ff/_gopath/src/badoo/lakafka/main.go
/home/marko/lakafka/gopath/_gopath/src/badoo/lakafka/main.go
/home/marko/lakafka/gopath/src/badoo/lakafka/main.go  <-- FOUND
aalexand commented 6 years ago

Alternatively, we could make the behavior of the -search_path consistent with GDB behavior but for relative path example I had above that would be only

/home/aalexand/src/baz.c
/usr/local/share/company/src/baz.c

which I think is inconvenient which is why GDB also has the option to configure substitution rules, see the same link. The solution I suggested may be a simpler compromise.

rauls5382 commented 6 years ago

The idea behind the current search is to allow the user to specify a path inside their copy of the source tree and have things work, without having to finely line up mount points. This works pretty well with the default ($CWD), so if you're standing anywhere inside a copy of the source tree things just work.

I agree that we should fix the handling of absolute paths. Perhaps adding an option for some simple substitution?

About trimming profile paths from the left, I would worry about there being multiple files in the tree with the same set of trailing directories. Say you have a tree with:

util/config.h cool-library/util/config.h

I would want to avoid resolving cool-library/util/config.h to util/config.h

aalexand commented 6 years ago

Allowing to specify an arbitrary path within the source root of interest doesn't sound too convincing of a heuristic to me unless it enables some use case rather than having just a utility function. And in some cases it could result in unusual behavior - e.g. if there is a source file with relative path like dev/myfile.c and any search path is specified, the code will end up looking files up in the /dev directory which is somewhat unexpected to me.

If there is a tree /src/util/config.h and /src/cool-library/util/config.h and relative path cool-library/util/config.h is searched, the right file would be found as longer paths are tried first. Wrong file could be found if there is only /src/util/config.h on disk but not /src/cool-library/util/config.h.

MikeSpreitzer commented 6 years ago

I also have the problem of running pprof on a machine that is not the build machine, and having absolute paths in the profile. Here is an example stack frame from a profile I am studying:

0xe5543e k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage.(*cacheWatcher).process+0x36e /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/storage/cacher.go:947

So far, I have been able to hack the machine running pprof so that the build paths exist on my pprof machine --- but I do not enjoy doing this. Would it make sense to introduce an option to simply ignore the absolute paths and do a lookup based on the source_path and that other identifier?

robfig commented 6 years ago

This is an issue for me too. Our Go binaries are produced on a CI server, and I often take profiles from production but there's no way afaik to get the Source view.

Maintainers (@rauls5382 ?) Is there an acceptable functional design, such that I could contribute a Pull Request?

aalexand commented 6 years ago

Apologies about the radio silence. Yeah, time to figure this one out.

My preference is still what I suggested in https://github.com/google/pprof/issues/262#issuecomment-345001686 since it addresses every use case I can think of with just a slight tweak of the current behavior of -search_path and does not introduce new flags. @rauls5382 Objections to that?

rauls5382 commented 6 years ago

The use case that I want to make sure is preserved is when the user's cwd is somewhere inside their local build tree and the profile has relative paths. Today that 'just works' without having to cd to the root of the local tree or specifying the source_path option. I think this proposal will break that flow.

I think the fundamental problem is that we do not know how much of the remote path is irrelevant an should be stripped. Could we perhaps implement some heuristics and provide an option to override?

Say that if the user specifies source_tree=/my/tree/lakafka and we see remote paths that include /.../lakafka/... we could automatically strip the remote paths to the left.

aalexand commented 6 years ago

I see -- I agree breaking the current behavior is not a good idea.

The proposed heuristic might be not reliable if the anchor directory name is something ambiguous like "src".

Another solution is introduce a new option like you suggested before. Something like -rewrite_path=/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka=/home/marko/lakafka (or -rewrite_path=/home/teamcitycpp/agent09/work/56cbaf90675b10ff/_gopath/src/badoo/lakafka=lakafka to turn the absolute path into relative and have default $CWD-based search path do the rest when running the pprof command from the source directory).

This could also have some limited support for globs to allow something like -rewrite_path=/*/_gopath/src/badoo/lakafka=/home/marko/lakafka to make the rewrite option independent of the random components like in this example.

rauls5382 commented 6 years ago

SGTM. I agree that a heuristic can be defeated but we would have an option override in the (hopefully infrequent) corner cases.

How about if instead of a full rewrite_path option we have a --remote_source_path that says what we should strip from absolute paths to turn them into relative paths, and then let source_path work as it does today. Have remote_source_path default to a good guess based on the source_path.

My concern with rewrite_path is how it interacts with source_path. I'd rather have something more orthogonal.

aalexand commented 6 years ago

Despite the radio silence, I do have a change for this, will send it soon.

fmstephe commented 6 years ago

@aalexand If you need any help, or testing or code review give me a bell. I am very motivated to get this change. We are running a fork of pprof right now to be able to see source built on other machines, and I would love to be able to use the mainline version.

aalexand commented 6 years ago

@fmstephe FYI #366

ilya-korotya commented 5 years ago

@aalexand, why I got this error? I run:

go tool pprof -source_path=/home/korotya-ilya/go/src/github.com/ilya-korotya/highloadcup_2018 highloadcup_2018 profile.out

But when I run list command in pprof, I get error message:

ROUTINE ======================== github.com/ilya-korotya/highloadcup_2018/postgres.changeSymbol in /go/src/github.com/ilya-korotya/highloadcup_2018/postgres/init.go
         0      640ms (flat, cum) 36.99% of Total
 Error: open /go/src/github.com/ilya-korotya/highloadcup_2018/postgres/init.go: no such file or directory

Why source_path flag don't help change path to source files? Could you help me with this?

aalexand commented 5 years ago

@ilya-korotya Please file a separate report with instructions on how to reproduce.