jvm-profiling-tools / honest-profiler

A sampling JVM profiler without the safepoint sample bias
https://github.com/RichardWarburton/honest-profiler/wiki
MIT License
1.25k stars 146 forks source link

Store additional information in agent's log #168

Closed ikavalio closed 7 years ago

ikavalio commented 7 years ago

Hi,

This PR extends functionality of agent's log writer to store time stamp (up to nanoseconds where supported) of each trace and threads' metadata within log file. Unfortunately the change is not backward compatible, so that it wouldn't be possible to read old files in a new version of honest-profiler (each TRACE_START entry contains 2 additional longs). Currently thread meta consists only of thread's name, which is sufficient to annotate threads in Tree View of FX application:

screenshot_25

If meta information can't be obtained for given thread, tid will be replaced with 'unknown id' and name with 'Unknown' (thread for which (int64_t)env_id is used instead of real tid) in FX app.

ConsoleLogDumpApplication was also fixed to include thread's metadata and time stamps to the output.

nitsanw commented 7 years ago

This is a great improvement to HP, thanks! It needs testing, but the code LGTM on a skim review.

RichardWarburton commented 7 years ago

I think the async-safety question is my only blocker to merging tbh.

ikavalio commented 7 years ago

@RichardWarburton, @nitsanw

Thanks a lot for reviewing my code! I like the idea with TRACE_WITH_TIME suggested by Nitsan, so I should probably add this to current PR.

Regarding clock_get_time... What I've found so far (obviously there are no statements in available mach docs mentioning that function is reentrant or async-safe):

1) clock_get_time is used within nanosleep function:

if (remaining_time != NULL) {
    /* once we add requested_time, this will be the completion time */
        kret = clock_get_time(clock_port, &completion);
        if (kret != KERN_SUCCESS) {
...

2) nanosleep is not guaranteed to be async-safe, but sleep is. However sleep looks like a wrapper for nanosleep:

if (_nanosleep(&time_to_sleep, &time_remaining) != -1)
        return (0);

Do not pay much attention to underscores, believe me that they are pointing to correct functions. I'm still not 100% confident that function is async-safe but it looks like it is.

However I don't have any clues if host_get_clock_service and mach_port_deallocate can be called in async context. I'll try to experiment with moving them out of the function and calling only once at startup/shutdown.

ikavalio commented 7 years ago

I've refactored clock usage for OS X and added test to actually check that it works in simplest cases (not async). Only mach's clock_get_time will be called in async context, so we just have to make sure it's indeed async safe. Special entry was added to save traces with timestamps to preserve compatibility with older files.

I've also added travis ci config to test native agent on os x (Apple LLVM version 7.0.0 (clang-700.0.72)) and linux (clang 3.5, gcc 4.8), so it might help to detect harmful changes in future and can be easily extended to run mvn test.

I also believe that I've accidentally fixed #146.

RichardWarburton commented 7 years ago

Hi @ikavalio , I've merged this PR, but I had to do it manually since there was another PR I merged just now. Things seem to build ok on Linux after the merge. Can you retest and confirm that they work ok on Mac OS, since I don't have access to a Mac OS machine.

ikavalio commented 7 years ago

Hi @RichardWarburton, thanks! It works for my El Capitan with homebrew.