tmm1 / stackprof

a sampling call-stack profiler for ruby 2.2+
MIT License
2.08k stars 128 forks source link

Segmentation fault when running Stackprof #81

Open volkanunsal opened 7 years ago

volkanunsal commented 7 years ago

It's my first time using Stackprof. I tried running my Rails app in a Rake task and benchmarking the results and encountered a segmentation error or something. This is what I see in my console.

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

Crash report is attached.

ruby_2017-07-25-173816_Aeronaut-2.txt

tmm1 commented 7 years ago

Backtrace from crash report:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib              0x00007fff8cb4ff06 __pthread_kill + 10
1   libsystem_pthread.dylib             0x00007fff916c34ec pthread_kill + 90
2   libsystem_c.dylib                   0x00007fff8af966df abort + 129
3   libruby.2.4.1.dylib                 0x0000000104372779 die + 9
4   libruby.2.4.1.dylib                 0x0000000104372768 rb_bug + 600
5   libruby.2.4.1.dylib                 0x00000001044dad32 rb_profile_frame_full_label + 370 (ruby.h:313)
6   stackprof.bundle                    0x00000001085918ea frame_i + 58 (stackprof.c:191)
7   libruby.2.4.1.dylib                 0x00000001044617eb st_general_foreach + 139 (st.c:1465)
8   stackprof.bundle                    0x000000010859147b stackprof_results + 347 (stackprof.c:244)
volkanunsal commented 7 years ago

Sorry I'm a little slow. Is this caused by a crash in the libsystem_kernel.dylib? What causes that? Low memory?

tmm1 commented 7 years ago

No the crash seems to from stackprof.c:191 where it calls rb_profile_frame_full_label. For some reason the ruby VM is crashing when that function is invoked. I'm not sure why.. it could be a bug in stackprof or a bug in ruby.

volkanunsal commented 7 years ago

Oh I see. I tried this with Ruby 2.3.4 also and got the same error.

volkanunsal commented 7 years ago

Here is the rake script I used.

https://gist.github.com/volkanunsal/1247d12d11a324fc487542060431d7cd

tmm1 commented 7 years ago

You should probably put StackProf around Rack::Server.start instead of the entire rake task

volkanunsal commented 7 years ago

You mean something like this?

Thread.new do
  StackProf.run(mode: :wall, out: file) do
    Rack::Server.start(
      app: DERAILED_APP,
      Port: port,
      environment: 'none',
      server: server
    )
  end
end
sleep 1

def call_app(path = get_path)
  cmd = "curl 'http://localhost:#{@port}#{path}' -g -s --fail 2>&1"
  response = `#{cmd}`
  unless $? && $?.success?
    raise "Bad request to #{cmd.inspect} Response:\n#{ response.inspect }"
  end
end

TEST_COUNT.times do
  call_app
end

cmd = "stackprof #{file}"
puts `#{cmd}`

Am I doing this right?

The problem with that is it doesn't output the test results to file.

tmm1 commented 7 years ago

You would need to shut down the rack server at the end of the benchmark

volkanunsal commented 7 years ago

Ohhhhhhh

volkanunsal commented 7 years ago

Sorry to bother your again. I tried your suggestion to shutdown the server and I still can't get the output file to show up. Here is my code:


server = 'webrick'
rack_server = Rack::Server.new(
  app: DERAILED_APP,
  Port: port,
  environment: 'none',
  server: server
)

file = 'tmp/foo.dump'
thread = Thread.new do
  StackProf.run(mode: :wall, out: file) do
    rack_server.start
  end
end
sleep 1

TEST_COUNT.times do
  call_app
end

puts StackProf.running? # true
rack_server.server.shutdown
puts thread.status # sleep
sleep 1
puts StackProf.running? # false
puts thread.status # nil
puts File.exist?(file) # false
tmm1 commented 7 years ago

Hmm, not sure why that didn't work.

You can try instead to wrap StackProf.run around your TEST_COUNT.times loop. stackprof will always profile all threads, so it shouldn't matter where it is run.

volkanunsal commented 7 years ago

I tried that. This time it's giving me this error

rake aborted!
NoMethodError: undefined method `flush' for nil:NilClass
...perf.rake:152:in `run'
volkanunsal commented 7 years ago

I think that was happening because I was using file = Rails.root.join(path) earlier. When I used a string path, it started working... :)

volkanunsal commented 7 years ago

Ah, it crashed again. It runs for only 10 calls or so before exploding.

benbuckman commented 5 years ago

Has anyone been able to debug or fix this? I'm observing the same segfault behavior on Ruby 2.6.3.

benbuckman commented 5 years ago

Anecdotally, it only seems to crash when interval is less than 100 μs.

benbuckman commented 4 years ago

Actually, the behavior I was seeing earlier was probably not a segfault – it was probably the program hanging – described in https://github.com/tmm1/stackprof/issues/123 – but because the server was behind a proxy, the proxy interpreted that as the server dying, which I mis-interpreted as a crash.

FWIW, while we were pondering this segfault issue (with the prior understanding that this was occurring), our hypothesis was that this was caused by signal races, specifically a signal handler called inside another signal handler, if the sampling rate is too high for the CPU to schedule them at the correct intervals, (exacerbated by the CPU being under load). So if anyone is actually seeing segfaults and wants to debug, you can consider that hypothesis to explore.