namhyung / uftrace

Function graph tracer for C/C++/Rust/Python
https://uftrace.github.io/slide/
GNU General Public License v2.0
3.04k stars 473 forks source link

uftrace for chrome #642

Open honggyukim opened 5 years ago

honggyukim commented 5 years ago
$ cat .gclient
solutions = [
  {
    "url": "https://chromium.googlesource.com/chromium/src.git",
    "managed": False,
    "name": "src",
    "custom_deps": {},
  },
]

$ git clone https://chromium.googlesource.com/chromium/src.git
$ cd src

The applies the below change to enable -pg -g build.

diff --git a/build/config/compiler/BUILD.gn b/build/config/compiler/BUILD.gn
index 6b4eb1f7a08b..02ef10cc1b88 100644
--- a/build/config/compiler/BUILD.gn
+++ b/build/config/compiler/BUILD.gn
@@ -202,9 +202,9 @@ config("default_include_dirs") {

 config("compiler") {
   asmflags = []
-  cflags = []
-  cflags_c = []
-  cflags_cc = []
+  cflags = [ "-g", "-pg", "-fno-omit-frame-pointer" ]
+  cflags_c = [ "-g", "-pg", "-fno-omit-frame-pointer" ]
+  cflags_cc = [ "-g", "-pg", "-fno-omit-frame-pointer" ]
   cflags_objc = []
   cflags_objcc = []
   ldflags = []
$ gn gen out/Release.pg.g "--args=enable_nacl=false is_component_build=false use_lld=false is_debug=false use_jumbo_build=true"
$ ninja -C out/Release.pg.g chrome

I fixed some compilation errors because of some build scripts that have fomit-frame-pointer.

$ ninja -C out/Release.pg.g chrome
        ...
clang++: error: invalid argument '-fomit-frame-pointer' not allowed with '-pg'
        ...

After the build is done, I tried to run chrome with uftrace.

$ uftrace record --no-libcall -t 1ms -a ./chrome --no-sandbox

It was successful to open a tab and go to https://www.youtube.com without a problem although it's very slow. But after the testing, the process is crashed when I close the tab to finish. The log is as follows:

$ uftrace record --no-libcall -t 1ms -a ./chrome --no-sandbox
[18364:3158:0114/182020.004406:ERROR:bus.cc(394)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")

(chrome:18364): LIBDBUSMENU-GLIB-WARNING **: Unable to get session bus: Unknown or unsupported transport 'disabled' for address 'disabled:'
[3127:3127:0114/182152.980003:ERROR:gl_surface_glx.cc(429)] GLX 1.3 or later is required.
[3127:3127:0114/182152.980268:ERROR:gl_initializer_x11.cc(147)] GLSurfaceGLX::InitializeOneOff failed.
[3127:3127:0114/182153.005095:ERROR:viz_main_impl.cc(184)] Exiting GPU process due to errors during initialization
[18364:18364:0114/182325.058495:ERROR:pending_bookmark_app_manager.cc(111)] Not implemented reached in virtual void extensions::PendingBookmarkAppManager::UninstallApps(std::vector<GURL>, const web_app::PendingAppManager::UninstallCallback &)
[3223:3223:0114/182357.513507:ERROR:remote_pal_observer.cc(47)] Failed to create dispatcher because MainRenderFrame is NULL
[3326:3326:0114/182415.296398:ERROR:remote_pal_observer.cc(47)] Failed to create dispatcher because MainRenderFrame is NULL
[3353:3353:0114/182606.863770:ERROR:remote_pal_observer.cc(47)] Failed to create dispatcher because MainRenderFrame is NULL
chrome: /mnt/data1/home3/honggyu/work/uftrace/git/uftrace/libmcount/mcount.c:1005: mcount_exit: Assertion `mtdp != NULL' failed.
Received signal 6
Received signal 11 SEGV_MAPERR 000000000001
child terminated by signal: 11: Segmentation fault

The replay output looks good anyway.

honggyukim commented 5 years ago

It looks schedule event still shows regardless of time-filter setting in some cases.

$ uftrace record --no-libcall -t 1ms -a ./chrome --no-sandbox
$ uftrace replay
        ...
            [  3108] | base::_GLOBAL__N_1::ThreadFunc(0x126c3b628270) {
  57.138 us [ 18364] |                   /* linux:sched-in */
            [  3108] |   base::internal::SetCurrentThreadPriorityForPlatform(kLowPriority) {
            [  3108] |     base::FilePath::FilePath(0x7fee79906a10, &base::_GLOBAL__N_1::kCgroupDirectory) {
            [  3108] |       base::internal::CopyToString(0x7fee799069d8, 0x7fee79906a10) {
            [  3108] |         std::__1::basic_string::assign(0x7fee79906a10, "/sys/fs/cgroup", 14) {
            [ 18364] |                   /* linux:sched-out */
            [  3107] |           /* linux:sched-out */
            [  3108] |           /* linux:sched-out */
  36.845 us [ 18364] |                   /* linux:sched-in */
            [ 18364] |                   /* linux:sched-out */
  53.473 us [  3108] |           /* linux:sched-in */
  87.489 us [  3107] |           /* linux:sched-in */
            [  3108] |           /* linux:sched-out */
  88.314 us [ 18364] |                   /* linux:sched-in */
   8.227 us [  3107] |           /* linux:schedule */
 114.259 us [  3108] |           /* linux:sched-in */
            [  3107] |           /* linux:sched-out */
            [ 18364] |                   /* linux:sched-out */
            [  3108] |           /* linux:sched-out */
  14.201 us [ 18364] |                   /* linux:sched-in */
  16.432 us [  3107] |           /* linux:sched-in */
  19.423 us [  3108] |           /* linux:sched-in */
   1.455 ms [ 18364] |                 } /* content::BrowserMainLoop::PostMainMessageLoopStart */
            [  3108] |           /* linux:sched-out */
            [  3107] |           /* linux:sched-out */
  48.920 us [  3108] |           /* linux:sched-in */
            [ 18364] |                 content::BrowserMainLoop::CreateStartupTasks(0x126bf4bad540) {
            [ 18364] |                   /* linux:sched-out */
            [  3108] |           /* linux:sched-out */
 118.196 us [ 18364] |                   /* linux:sched-in */
            [  3109] | base::_GLOBAL__N_1::ThreadFunc(0x126c3b628240) {
 187.132 us [  3107] |           /* linux:sched-in */
            [  3109] |   base::internal::SetCurrentThreadPriorityForPlatform(kLowPriority) {
 130.060 us [  3108] |           /* linux:sched-in */
            [  3109] |     base::FilePath::FilePath(0x7fee790c5a10, &base::_GLOBAL__N_1::kCgroupDirectory) {
            [  3109] |       base::internal::CopyToString(0x7fee790c59d8, 0x7fee790c5a10) {
            [  3109] |         std::__1::basic_string::assign(0x7fee790c5a10, "/sys/fs/cgroup", 14) {
            [ 18364] |                   content::StartupTaskRunner::RunAllTasksNow(0x126c3b5906c0) {
            [ 18364] |                     base::internal::Invoker::Run() {
            [ 18364] |                       content::BrowserMainLoop::PreCreateThreads(0x126bf4bad540) {
            [ 18364] |                         ChromeBrowserMainParts::PreCreateThreads(0x126c1d360e10) {
            [ 18364] |                           ChromeBrowserMainParts::PreCreateThreadsImpl(0x126c1d360e10) {
            [ 18364] |                             BrowserProcessImpl::Init(0x126c3b402400) {
            [ 18364] |                               EnsureExtensionsClientInitialized() {
            [ 18364] |                                 extensions::ExtensionsClient::Set(&EnsureExtensionsClientInitialized::extensions_client) {
   1.842 ms [  3107] |         } = 0x7fee7a147a10; /* std::__1::basic_string::assign */
   1.845 ms [  3107] |       } /* base::internal::CopyToString */
   1.848 ms [  3107] |     } /* base::FilePath::FilePath */
        ...
honggyukim commented 5 years ago

It seems that chrome is being recorded without a problem so far. I don't see segfault even when it's closed. I will make a report if any record is failed.

honggyukim commented 5 years ago

It worked even without --no-libcall.

honggyukim commented 5 years ago

Hmm.. It gets crashed after running for a long time with two tabs - one for youtube media play and the other for cnn site navigation.

$ uftrace record -t 10us --no-event ./chrome --no-sandbox
        ...
[4259:4259:0116/213317.097391:ERROR:remote_pal_observer.cc(47)] Failed to create dispatcher because MainRenderFrame is NULL
[3917:3917:0116/213317.553037:ERROR:render_media_log.cc(30)] MediaEvent: PIPELINE_ERROR DEMUXER_ERROR_NO_SUPPORTED_STREAMS
[3917:3917:0116/213317.554039:ERROR:render_media_log.cc(30)] MediaEvent: PIPELINE_ERROR DEMUXER_ERROR_NO_SUPPORTED_STREAMS
[4278:4278:0116/213320.972657:ERROR:remote_pal_observer.cc(47)] Failed to create dispatcher because MainRenderFrame is NULL
chrome: /home/honggyu/uftrace/libmcount/plthook.c:830: plthook_exit: Assertion `!check_thread_data(mtdp)' failed.
Received signal 6
#0 0x55cd922a2ce6 base::debug::StackTrace::StackTrace()
#1 0x55cd922a2821 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#2 0x7f77c48b9390 <unknown>
#3 0x7f77be985428 gsignal
#4 0x7f77be98702a abort
#5 0x7f77be97dbd7 <unknown>
#6 0x7f77be97dc82 __assert_fail
#7 0x7f77c4cd4341 plthook_exit
#8 0x7f77c4ceb709 plthook_return
#9 0x7f77c4ceb6ee plt_hooker
#10 0x7f775d7f8920 <unknown>
  r8: fefefefefefefeff  r9: 0000000000000001 r10: 0000000000000008 r11: 0000000000000206
 r12: 000000000000033e r13: 00007f77c4cedde8 r14: 0000000000001000 r15: 00007f775d7f8924
  di: 0000000000000b96  si: 00000000000010d9  bp: 00007f77c4cedbef  bx: 00007f77c5121000
  dx: 0000000000000006  ax: 0000000000000000  cx: 00007f77be985428  sp: 00007f775d7f86b8
  ip: 00007f77be985428 efl: 0000000000000206 cgf: 002b000000000033 erf: 0000000000000000
 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]
Calling _exit(1). Core file will not be generated.
honggyukim commented 5 years ago

Hmm.. It seems that it doesn't follow internally created renderer processes. https://www.chromium.org/developers/design-documents/multi-process-architecture

It only shows browser process and its child threads.

honggyukim commented 5 years ago

It seems that tid 1233 is a renderer process, but doesn't show anything after execvp called.

$ uftrace replay --tid 1233
# DURATION     TID     FUNCTION
            [  1233] | } /* fork */
  10.397 us [  1233] | open64();
            [  1233] | base::CloseSuperfluousFds() {
  15.313 us [  1233] |   open64();
  37.999 us [  1233] |   syscall();
  28.424 us [  1233] |   syscall();
  35.003 us [  1233] |   syscall();
  30.188 us [  1233] |   syscall();
  33.440 us [  1233] |   syscall();
  34.613 us [  1233] |   syscall();
  40.258 us [  1233] |   syscall();
  70.060 us [  1233] |   syscall();
  20.304 us [  1233] |   close();
  27.102 us [  1233] |   syscall();
  30.122 us [  1233] |   syscall();
  25.307 us [  1233] |   syscall();
  29.560 us [  1233] |   syscall();
 722.982 us [  1233] | } /* base::CloseSuperfluousFds */
            [  1233] | execvp() {

And there are many tasks that have empty trace record.

honggyukim commented 5 years ago

Hmm.. tested again, but it seems that renderer processes don't have any record. They don't even have their tid record in task list.

namhyung commented 5 years ago

It's possible to miss some if it removes/changes environment variables like LD_PRELOAD. But in this case it seems due to it closed (pipe) fd used by uftrace.

honggyukim commented 5 years ago

Ah.. chrome uses zygote model so it doesn’t do fork-and-exec. Instead, it may run fork-and-dlopen for required libraries. In this case, it’s really needed to close file desciptors from its parent process.

honggyukim commented 5 years ago

Is there any way to avoid closing fd for uftrace pipe?

honggyukim commented 5 years ago

We may have to hook close() call to check if it’s closing uftrace fd although it will add overhead for every close calls.

honggyukim commented 5 years ago

It works fine tracing chrome binary even with --auto-args.

honggyukim commented 5 years ago

It keeps crashes at the end of program termination process by showing below message.

... uftrace/libmcount/plthook.c:830: plthook_exit: Assertion `!check_thread_data(mtdp)' failed.

Can we avoid this?

honggyukim commented 5 years ago

This can be easily tested with headless chrome as follows:

$ uftrace record -t 1ms ./chrome --no-sandbox --headless
honggyukim commented 5 years ago

Hmm... chrome gets crashed in the following way:

  1. launch it with uftrace -a -t 1ms ./chrome --no-sandbox --sampling-heap-profiler
  2. go to www.cnn.com
  3. press F12 to open devtools panel

It was working fine under the step 2, but gets crashed at step 3. image

$ uftrace -a -t 1ms ./chrome --no-sandbox --sampling-heap-profiler
        ...
[6742:6742:0122/171806.574532:ERROR:CONSOLE(4599)] "Connection is closed, can't dispatch pending Debugger.setBlackboxPatterns", source: chrome-[13/198]
://devtools/bundled/shell.js (4599)
Received signal 11 SEGV_ACCERR 7fb1e6880030
#0 0x7fb1e6899ae7 backtrace
#1 0x5560e0bf1236 base::debug::StackTrace::StackTrace()
#2 0x5560e0bf0d71 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7fb1e647f390 <unknown>
#4 0x5560dff830e5 v8::internal::Sweeper::EnsurePageIsIterable()
#5 0x5560e03201d2 v8::internal::Heap::GcSafeFindCodeForInnerPointer()
#6 0x5560dff22c74 v8::internal::InnerPointerToCodeCache::GetCacheEntry()
#7 0x5560dff21f54 v8::internal::ExitFrame::Iterate()
#8 0x5560dffde2a9 v8::internal::Isolate::Iterate()
#9 0x5560e031b328 v8::internal::Heap::IterateStrongRoots()
#10 0x5560e0315865 v8::internal::Heap::Scavenge()
#11 0x5560e0312398 v8::internal::Heap::PerformGarbageCollection()
#12 0x5560e0310861 v8::internal::Heap::CollectGarbage()
#13 0x5560e031c946 v8::internal::Heap::AllocateRawWithLightRetry()
#14 0x5560e031c9c0 v8::internal::Heap::AllocateRawWithRetryOrFail()
#15 0x5560dff3679f v8::internal::Factory::NewRawOneByteString()
#16 0x5560dff37da7 v8::internal::Factory::NewConsString()
#17 0x5560dfc5e11a v8::internal::AstValueFactory::Internalize()
#18 0x5560dfce8147 v8::internal::(anonymous namespace)::FinalizeTopLevel()
#19 0x5560dfce5a85 v8::internal::(anonymous namespace)::CompileToplevel()
#20 0x5560dfce6923 v8::internal::Compiler::GetSharedFunctionInfoForScript()
#21 0x5560e02a3179 v8::ScriptCompiler::CompileUnboundInternal()
#22 0x5560e02a35ba v8::ScriptCompiler::Compile()
#23 0x5560e49d6e89 blink::(anonymous namespace)::CompileScriptInternal()
#24 0x5560e49bdd30 blink::V8ScriptRunner::CompileScript()
#25 0x5560e49bcbbc blink::ScriptController::ExecuteScriptAndReturnValue()
#26 0x5560e49bf423 blink::ScriptController::EvaluateScriptInMainWorld()
#27 0x5560e49bf65f blink::ScriptController::ExecuteScriptInMainWorld()
#28 0x5560e39e9761 blink::PendingScript::ExecuteScriptBlockInternal()
#29 0x5560e39e9620 blink::PendingScript::ExecuteScriptBlock()
#30 0x5560e39e2ef8 blink::(anonymous namespace)::DoExecuteScript()
#31 0x5560e39e2e6c blink::HTMLParserScriptRunner::ExecutePendingParserBlockingScriptAndDispatchEvent()
#32 0x5560e39e4031 blink::HTMLParserScriptRunner::ExecuteParsingBlockingScripts()
#33 0x5560e3603627 blink::HTMLDocumentParser::CanTakeNextToken()
#34 0x5560e3603086 blink::HTMLDocumentParser::PumpTokenizer()
#35 0x5560e3606c3f blink::HTMLDocumentParser::insert()
#36 0x5560e323e16f blink::Document::write()
#37 0x5560e323e5a1 blink::Document::write()
#38 0x5560e4bd98ad blink::DocumentV8Internal::write1Method()
#39 0x5560e4a8c1f1 blink::V8Document::writeMethodCallback()
#40 0x5560dfcc5865 v8::internal::FunctionCallbackArguments::Call()
#41 0x5560dfc8ad0c v8::internal::(anonymous namespace)::HandleApiCallHelper<>()
#42 0x5560dfc8a3ac v8::internal::Builtin_Impl_HandleApiCall()
#43 0x5560e04cbace <unknown>
  r8: 0000000000000000  r9: 0000000000000001 r10: 00007ffe28962a50 r11: 0032cf8f8b73fc98
 r12: 0000000000000000 r13: 00007ffe289655b8 r14: 00007fb1e6880000 r15: 00005561355e89d0
  di: 00005561355e89d0  si: 00007fb1e6880000  bp: 00007ffe28962b70  bx: 00007fb1e6880000
  dx: 00007ffe289663a8  ax: 00005561355c77a0  cx: 0000000000000000  sp: 00007ffe28962b50
  ip: 00005560dff830e5 efl: 0000000000010206 cgf: 002b000000000033 erf: 0000000000000004
 trp: 000000000000000e msk: 0000000000000000 cr2: 00007fb1e6880030
[end of stack trace]
Calling _exit(1). Core file will not be generated.

It shows SEGV_ACCERR in the log message.

honggyukim commented 5 years ago

The renderer process can be traced after 15e5bef committed. But I had another crash as follows:

$ uftrace record -t 1ms --no-libcall ./chrome --no-sandbox
[29558:27583:0129/194802.425288:ERROR:bus.cc(394)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
        ...
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM default
chrome: /home/honggyu/work/uftrace/libmcount/mcount.c:1257: mcount_exit: Assertion `mtdp != NULL' failed.
Received signal 6
Received signal 11 SEGV_MAPERR 000000000001
...

Just leaving a note for later reference.

honggyukim commented 5 years ago

It seems to be fixed with #754 as follows:

$ uftrace record --no-libcall -t 1ms -a ./chrome --no-sandbox --headless https://www.google.com
[0512/190541.152765:WARNING:spdy_session.cc(3180)] Received HEADERS for invalid stream 1
[0512/190541.190723:WARNING:spdy_session.cc(3180)] Received HEADERS for invalid stream 17
namhyung commented 5 years ago

I think it's only related when the binary is built without -pg or so.