jvm-profiling-tools / perf-map-agent

A java agent to generate method mappings to use with the linux `perf` tool
GNU General Public License v2.0
1.64k stars 260 forks source link

Document or improve scripts to make perf-java-flames work with docker #50

Open alicegoldfuss opened 7 years ago

alicegoldfuss commented 7 years ago

I'm trying to create a Java process FlameGraph with perf-java-flames. It seems to run successfully, but I can't find the resulting svg file.

$ ./perf-java-flames 161991 -F 99 -a -g -- sleep 30
Recording events for 15 seconds (adapt by setting PERF_RECORD_SECONDS)
Warning:
PID/TID switch overriding SYSTEM
$

CentOS 7 3.10.0-327.36.3.el7.x86_64 cmake version 2.8.12.2

Up-to-date versions of perf-map-agent and the FlameGraph repo.

alicegoldfuss commented 7 years ago

Looks like it's only creating the .data file and not .stacks or .collapsed

perf-map-agent]# ls -la /tmp/ | grep 161991
-rw-------  1 root  root    1919352 Jan 24 19:40 perf-161991.data
jrudolph commented 7 years ago

Can you try without -- sleep 30? perf-java-record-stack already does -- sleep $PERF_RECORD_SECONDS.

jrudolph commented 7 years ago

I see that this is not particularly well documented...

You could also try enabling set -x in one or several of the scripts to see where it exits. (Shell scripting is not my expertise so any help is appreciated.)

alicegoldfuss commented 7 years ago

I'm getting the same results without sleep 30 and even a vanilla run like

$ ./bin/perf-java-flames 161991

Where is the resulting svg file supposed to turn up?

jrudolph commented 7 years ago

In the same directory as the script is run but it should show a line with the name.

alicegoldfuss commented 7 years ago

Yeah it's definitely not showing up there. I'm going to try to create a Java FlameGraph with the manual steps.

nitsanw commented 7 years ago

To eliminate some suspects:

  1. Does the perf-java-top script work for you?
  2. Which version of Java are you using?
  3. Is the Java process using the -XX:+PreserveFramePointer? This would require an OpenJDK/Oracle post 8u60 release.
  4. Can you generate normal perf flame-graphs on your setup? Even without perf-map-agent generating the map file you should be able to at least see the native portion of the JVM process, so getting nothing at all suggests some issue in the perf interaction. Or perhaps some incompatability with the scripts, though I've used them on CentOS 7 before and they "Just Worked"...
alicegoldfuss commented 7 years ago
  1. It fails with bash: sudo: java: command not found even when run with root with java in the path
  2. Java is 1.8.0_102-b14
  3. Yes it's using this option.
  4. Yes I can create perf FlameGraphs using perf and the FlameGraph repo, just not with any of your tools.
nitsanw commented 7 years ago

Thanks! I managed to reproduce this issue locally and have a fix. I'll send a PR in a second, but it's very minor so if you can't wait you can go ahead and fix locally by applying the following:

diff --git a/bin/create-java-perf-map.sh b/bin/create-java-perf-map.sh
index 52ee75d..b297067 100755
--- a/bin/create-java-perf-map.sh
+++ b/bin/create-java-perf-map.sh
@@ -24,5 +24,5 @@ fi
 [ -d "$JAVA_HOME" ] || (echo "JAVA_HOME directory at '$JAVA_HOME' does not exist." && false)

 sudo rm $PERF_MAP_FILE -f
-(cd $PERF_MAP_DIR/out && sudo -u \#$TARGET_UID java -cp $ATTACH_JAR_PATH:$JAVA_HOME/lib/tools.jar net.virtualvoid.perf.AttachOnce $PID "$OPTIONS")
+(cd $PERF_MAP_DIR/out && sudo -u \#$TARGET_UID $JAVA_HOME/bin/java -cp $ATTACH_JAR_PATH:$JAVA_HOME/lib/tools.jar net.virtualvoid.perf.AttachOnce $PID "$OPTIONS")
 sudo chown root:root $PERF_MAP_FILE

Which just carries the JAVA_HOME through to the sudo command

nitsanw commented 7 years ago

See PR #51

nitsanw commented 7 years ago

Please let me know if the fix helps your ultimate goal which is to get flame graphs. Also note that if you are aiming to collect machine wide stats for many Java processes @brendangregg has the jmaps scrips which creates a map file for all java processes which can be used as part of producing machine wide profile: https://github.com/brendangregg/FlameGraph/blob/3da963a74a686e2caea489ba637f6afdb6d6658a/jmaps

alicegoldfuss commented 7 years ago

I suspect this fix will still fail for me, due to a bug in Java that requires me to dump the symbols as the user of the running Java process. But I will let you know!

Also thanks for the jmaps link, but there's only one Java process on this machine.

alicegoldfuss commented 7 years ago

I added the fix but it still fails, even when running as root. New error though:

# ./bin/perf-java-top 161991
sudo: unable to execute /home/alice/jdk1.8.0_102/bin/java: Permission denied
# ls -la /home/alice/jdk1.8.0_102/bin/java
-rwxr-xr-x 1 alice alice 7734 Jan 24 00:33 /home/alice/jdk1.8.0_102/bin/java

I get the same error when running as alice. And yes I can run java directly by calling that path.

nitsanw commented 7 years ago

OK... Not seen that one before. For what it's worth my java executable has the exact same permissions. Running the script as root works for me if I setup the JAVA_HOME environment variable.

We can work out an alternative, I think. The permissions game in the scripts is around 2 files, the map file and the perf.data file:

Using the above I've setup boxes where users are allowed to perf profile their own Java processes with slightly modified scripts, essentially removing the sudo prefix everywhere and the file ownership manipulation. I've not attempted to merge these efforts back. Reflecting on this, perhaps the issue you are seeing is because the user whose process you are trying to profile does not have permission to run you Java installation? Maybe pointing JAVA_HOME at an installation available to all users will solve the issue?

brendangregg commented 7 years ago

Some security enforcement preventing alice from executing things? like seccomp?

FWIW, my jmaps tool also works around the perf issue of needing the /tmp/perf*map files as owned by root.

alicegoldfuss commented 7 years ago

Apologies for going dark. I've been digging into this issue with the manual commands.

The issue comes down to containers and namespacing. The Java process I'm trying to profile is running inside a container. The process is owned by a user inside the container, but only has a UID exposed to the host. Even spoofing a user with that UID on the host doesn't work when trying to dump symbols. And I can't do the profiling inside the container, because perf isn't installed and the version of Ubuntu running inside the container is too new for the underlying host kernel to have a supported perf package.

My planned workaround (which I haven't verified, but I believe will work) is:

  1. Drop the symbols from inside the container.
  2. Get the resulting perf-pid.map onto the underlying host via a mounted volume.
  3. Change the perf-pid.map filename to match the Java process PID as seen by the host (and chown to root).
  4. Run the perf and FlameGraph scripts on the host, using the renamed perf-pid.map file.

I think this will give me what I want.

nitsanw commented 7 years ago

@alicegoldfuss Thanks for sharing your use case in more detail. I bow to your Linux Fu powers, sounds like you are on your way to cracking it, when you do please share the details. It would perhaps help to add this to the wiki. I have not looked much into it, but this project: https://github.com/chbatey/docker-jvm-flamegraphs By @chbatey aims to demo a solution to what seems like a similar challenge.

nitsanw commented 7 years ago

And the relevant blog post to go with the repo: http://batey.info/docker-jvm-flamegraphs.html

alicegoldfuss commented 7 years ago

Ah, looks like this person has come to the same conclusion as me! That's comforting :)

nitsanw commented 7 years ago

@alicegoldfuss if nothing else I at least hope I've introduced you to the right person :-)

brendangregg commented 7 years ago

Ah, right, containers and perf. I've been meaning to post a blog post too -- we've all probably been working on the same problem. :)

Christopher's post is good, but he needs to let the JVM warm up a bit more -- too many "Interpreter" frames -- they haven't hit CompileThreshold yet.

jrudolph commented 7 years ago

Ah, this is about containers. I also tried to get it working but only half-hearted. I would also be interested in getting this to work. Thanks for having the discussion here and the extra links, @alicegoldfuss, @nitsanw, and @brendangregg.

alicegoldfuss commented 7 years ago

My workaround worked!

I'm going to dance to something and then document what I did.

screen shot 2017-01-27 at 11 13 07 am
alicegoldfuss commented 7 years ago

Turned it into a blog post. Thanks everyone for your help: http://blog.alicegoldfuss.com/making-flamegraphs-with-containerized-java/

jrudolph commented 7 years ago

Thanks a lot, @alicegoldfuss for documenting your findings!

bobrik commented 6 years ago

I added transparent support for containers in jmaps: https://github.com/brendangregg/FlameGraph/pull/171.

bobrik commented 6 years ago

To add to the "Why?" section of blog post by @alicegoldfuss, the reason seems to be this:

    // "/tmp" is used as a global well-known location for the files
    // .java_pid<pid>. and .attach_pid<pid>. It is important that this
    // location is the same for all processes, otherwise the tools
    // will not be able to find all Hotspot processes.
    // Any changes to this needs to be synchronized with HotSpot.
goldshtn commented 6 years ago

It’s a little unfortunate that this solution is Docker-specific (relies on docker exec). Perhaps we can make it more general by using nsenter?

On Wed, May 16, 2018 at 08:05 Ivan Babrou notifications@github.com wrote:

I added transparent support for containers in jmaps: brendangregg/FlameGraph#171 https://github.com/brendangregg/FlameGraph/pull/171.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jvm-profiling-tools/perf-map-agent/issues/50#issuecomment-389395356, or mute the thread https://github.com/notifications/unsubscribe-auth/ABm3vVdpi3fu_4QJGMKjWaEv31CJijXrks5ty7N-gaJpZM4Lsrjd .

bobrik commented 6 years ago

@goldshtn I replied to your comment in the PR. Let's have PR related comments there.

jrudolph commented 5 years ago

I managed to run the attach script from the host namespace. I haven't properly integrated that as it needs some hacking of internals from tools.jar because as you say above the attach mechanism relies on well-known paths shared between attach and target JVM. Right now it will only work if the target process has PID 1 in the container.

See jvm-profiling-tools/perf-map-agent/compare/jr/attach-to-container-from-host