dissect dead node service core dumps with mdb via a smart os vm
There is a Unix-like operating system called SmartOS whose ancestry represents a strong investment in low-level introspection tools (such as dtrace for instance).
Once such tool is mdb
, a high quality modular debugger which
ships with SmartOS - it can be used to inspect execution
context from the kernel to application layers.
Some rather clever people wrote a debugging module called mdb_v8
that allows introspection of node core dumps from a high level (e.g.
inspecting closure scope) to a low level (e.g. memory addresses).
It turns out that mdb
can analyse Linux core files as well
as SmartOS core files. We just have provide the core file
and the node binary that was running when the core dump was
generated.
autopsy
installs a a SmartOS VM and then acts as a
stdio proxy to mdb
.
For using mdb see the mdb reference docs
Install autopsy from npm:
npm install -g autopsy
Once finished the following executables will be available
Next, set up the VM
autopsy setup
This will install autopsy on the system, download smartos virtual machine assets and setup a smartos vm in virtual box.
Assets for the VM are ~150mb and downloads from S3.
If setup is interupted for any reason (including network failure during assets download), simply try again. Partial downloads will be resumed.
Before we can do an autopsy the VM needs to be running.
Simply run
autopsy start
Autopsy takes a snapshot of the initial VM state on first run to
optimize subsequent boots, so the first autopsy-start
will be
the longest.
The VM runs SmartOS completely in ram (there are no zones). This means VM state is immutable.
The autopsy
command takes the following args
autopsy [node-binary] core-file
On OS X the node binary is not optional, on Linux if not supplied the current installed node binary will be used.
When this command is run the following occurs
::load v8
to get the v8 related debugging commandsFor using mdb see the mdb reference docs
When we're done we may wish to free memory by stopping the VM with
autopsy stop
The example
folder has a core
and node
file that we're
generated by the die.js
file
You can try out autopsy with these two files (on OS X and Linux), from the same folder as this readme do
autopsy example/node example/core
Once the mdb console appears you can try
> ::jsstack
For starters, and then if you want to get fancy
> ::findjsobjects -p myproperty
137289672551
> 137289672551::jsprint
EC2 (and other VPS-type solutions) runs "machines" in virtualized containers, it's very tricky to make a virtual machine run on a virtual machines, and even where it is possible there is either an insufferable performance cost and/or certain low level features must be enabled which risk of introducing security issues. That aside copying node, a core file and using mdb all in a ram-only VM is memory intensive - not something we want to do (or maybe even can do) on a production server.
But autopsy provides a way to do seamless postmortems on an EC2 server or any kind of linux VM - by setting up an SSH tunnel back to through the local machine and into the SmartOS vm running locally.
This can be achieved in a few easy steps
autopsy
, like so: autopsy ssh -i myKey.pem user@example.com
Simply use whatever ssh
flags you normally would, and autopsy will additionally
set up the tunnelling (for the curious we inject the -R
flag with the port of VM
mapped through to the same port on the server.)
In production, if we run our node processes with --abort-on-uncaught-exception
we will always get a core dump when a process crashes (that is, as long as our linux environment is set up correctly)
You can also manually generate a core file using process.abort()
.
Finally a core file can also be obtained by attaching gdb
to a running processing and executing generate-core
.
If you're using an ubuntu server (and probably debian etc. etc.) you may have apport installed - this intercepts core files so we need to get rid of it
sudo apt-get purge apport
Next you need to make sure that linux is configured to allocate space for the core file, like so
ulimit -c unlimited
The VM currently maps port 2222 to the port 22 (ssh), at the moment is non-configurable - so to use autopsy port 2222 needs to be free on the host system.
Currently there's no command for removing the vm, follow these steps, in order
assets
folder from the autopsy module folder rm $(npm get prefix)/lib/node_modules/autopsy/assets
~/VirtualBox\ VMs
)We recommend installing globally, since there can (currently) only be one smartos vm.
If the smartos.iso file or any parent folder is moved/renamed the vm will fail to start because virtualbox won't be able to locate the the iso. In this case you would need to manually update virtual box with the paths.
For troubleshooting (or the curious), debugging can be turned on like so
DEBUG=autopsy:* <cmd>
At present the following commands have debug output