google / grr

GRR Rapid Response: remote live forensics for incident response
https://grr-doc.readthedocs.io/
Apache License 2.0
4.78k stars 763 forks source link

Problem making memory capture work #484

Closed vegardvaage closed 7 years ago

vegardvaage commented 7 years ago

Hi! I've probably missed something important, but I'm trying to figure out why I can't make memory capture work. I've set up a brand new Ubuntu 16.04 install of GRR using the Ubuntu install script, and deployed a client to the same machine. Other flows work well, but whenever I try capturing memory I get the following error in the web console:

Traceback (most recent call last): File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr/lib/flow_runner.py", line 613, in RunStateMethod responses=responses) File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr/lib/flow.py", line 353, in Decorated res = f(*args[:f.func_code.co_argcount]) File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr/lib/flows/general/memory.py", line 93, in CheckAnalyzeClientMemory raise flow.FlowError("Unable to image memory: %s." % responses.status) FlowError: Unable to image memory: message GrrStatus { child_session_id : SessionID: aff4:/C.064b13b39819533e/flows/F:E720B765/F:7CABCA88 cpu_time_used : message CpuSeconds { system_cpu_time : 0.0599999986589 user_cpu_time : 1.16999995708 } error_message : u'Traceback (most recent call last):\n File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr/lib/flow_runner.py", line 613, in RunStateMethod\n responses=responses)\n File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr/lib/flow.py", line 353, in Decorated\n res = f(*args[:f.func_code.co_argcount])\n File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr/lib/flows/general/memory.py", line 270, in End\n raise flow.FlowError("Error running plugins: %s" % all_errors)\nFlowError: Error running plugins: Client killed during transaction\n' network_bytes_sent : 522 status : GENERIC_ERROR }.

In the client log I see the following:

INFO:2017-03-21 09:15:44,185 comms:1338] aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00596904754639 sec. Sleeping for 0.611804572508 INFO:2017-03-21 09:15:44,803 comms:1338] aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00410008430481 sec. Sleeping for 0.703575258384 DEBUG:2017-03-21 09:15:45,223 components:104] Will import grr_rekall DEBUG:2017-03-21 09:15:45,223 components:104] Will import grr_rekall INFO:2017-03-21 09:15:45,298 comms:792] Sending back client statistics to the server. INFO:2017-03-21 09:15:45,559 init:81] RDFLib Version: 4.2.1 INFO:2017-03-21 09:15:45,909 init:42] Webconsole disabled: cannot import name webconsole_plugin DEBUG:2017-03-21 09:15:46,460 components:104] Will import memory DEBUG:2017-03-21 09:15:46,462 components:104] Will import rekall_types DEBUG:2017-03-21 09:15:46,463 components:104] Will import rekall_pb2 INFO:2017-03-21 09:15:46,463 components:153] Component grr-rekall already present. INFO:2017-03-21 09:15:46,472 comms:1338] aff4:/C.064b13b39819533e: Sending 1(1307), Received 0 messages in 0.0097451210022 sec. Sleeping for 0.809111547142 INFO:2017-03-21 09:15:47,289 comms:1338] aff4:/C.064b13b39819533e: Sending 2(923), Received 0 messages in 0.00530099868774 sec. Sleeping for 0.2 INFO:2017-03-21 09:15:47,498 comms:1338] aff4:/C.064b13b39819533e: Sending 0(634), Received 1 messages in 0.00722193717957 sec. Sleeping for 0.2 2017-03-21 09:15:47,569:DEBUG:rekall.1:Logging level set to 30 2017-03-21 09:15:47,729:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00910902023315 sec. Sleeping for 0.23 2017-03-21 09:15:47,968:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00577807426453 sec. Sleeping for 0.2645 2017-03-21 09:15:48,250:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.014014005661 sec. Sleeping for 0.304175 2017-03-21 09:15:48,563:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.0056939125061 sec. Sleeping for 0.34980125 2017-03-21 09:15:48,929:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.0126969814301 sec. Sleeping for 0.4022714375 2017-03-21 09:15:49,341:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00633001327515 sec. Sleeping for 0.462612153125 2017-03-21 09:15:49,813:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00566101074219 sec. Sleeping for 0.532003976094 2017-03-21 09:15:50,353:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00579500198364 sec. Sleeping for 0.611804572508 2017-03-21 09:15:50,975:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00584197044373 sec. Sleeping for 0.703575258384 2017-03-21 09:15:51,689:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00590705871582 sec. Sleeping for 0.809111547142 2017-03-21 09:15:52,515:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.0127019882202 sec. Sleeping for 0.930478279213 2017-03-21 09:15:53,455:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00575184822083 sec. Sleeping for 1.07005002109 2017-03-21 09:15:54,535:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00595211982727 sec. Sleeping for 1.23055752426 2017-03-21 09:15:55,778:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00799608230591 sec. Sleeping for 1.4151411529 2017-03-21 09:15:57,204:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00609111785889 sec. Sleeping for 1.62741232583 2017-03-21 09:15:58,841:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.0057110786438 sec. Sleeping for 1.87152417471 2017-03-21 09:16:00,724:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00586485862732 sec. Sleeping for 2.15225280091 2017-03-21 09:16:02,888:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00631785392761 sec. Sleeping for 2.47509072105 2017-03-21 09:16:05,375:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00571703910828 sec. Sleeping for 2.84635432921 2017-03-21 09:16:08,234:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00590705871582 sec. Sleeping for 3.27330747859 2017-03-21 09:16:11,521:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00627708435059 sec. Sleeping for 3.76430360038 2017-03-21 09:16:15,298:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00614500045776 sec. Sleeping for 4.32894914043 2017-03-21 09:16:19,643:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00789904594421 sec. Sleeping for 4.9782915115 2017-03-21 09:16:24,636:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.0063910484314 sec. Sleeping for 5.72503523822 2017-03-21 09:16:30,376:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00594997406006 sec. Sleeping for 6.58379052396 2017-03-21 09:16:36,973:INFO:root:aff4:/C.064b13b39819533e: Sending 0(634), Received 0 messages in 0.00326490402222 sec. Sleeping for 7.57135910255 2017-03-21 09:16:46,485:INFO:root:Sending back client statistics to the server. 2017-03-21 09:16:47,534:ERROR:root:Suicide by nanny thread.

When running the grrd manually I've sometimes seen actual memory capture progress logged as well, but the process is always killed by the nanny thread.

My understanding is that memory drivers are transferred automatically to the client, is that where I'm getting it wrong?

My thanks for any help!

vegardvaage commented 7 years ago

Interestingly the AnalyzeClientMemory flow using for instance pslist works.

grrrrrrrrr commented 7 years ago

Hm so what's happening here is that GRR is "inactive" for too long and gets killed by the watchdog before it can finish the memory collection. Acquiring memory might take a long time (how much memory does that machine have?) but we should not consider the time spent there as being unresponsive.

It could be that the Rekall plugin is not calling report progress properly but it could also be an actual issue where Rekall gets stuck somewhere.

Can you maybe run rekall directly on that machine? All we do is run the aff4acquire plugin, you could just test that manually.

vegardvaage commented 7 years ago

I installed a separate rekall instance (virtualenv) and ran aff4acquire from there, and there is some weirdness:

[1] Live(/proc/kcore) 08:37:11>
[1] Live(/proc/kcore) 08:37:40> plugins.aff4acquire(destination="/tmp/test/foo.aff4")
Will use compression: https://github.com/google/snappy
Will load physical address space from live plugin.
Imaging Physical Memory:: Merging Address Ranges 0x1000 \
  Reading 8169MiB / 8319MiB  9.79719664611e-05 MiB/s

It slows down a lot at the end of the dump (not seen it finish yet). This is on an Amazon AWS EC2 m4.large machine, could this be AWS related?

grrrrrrrrr commented 7 years ago

Hm seems like a Rekall issue, could totally be related to running on AWS. @scudette might know more about running Rekall in this environment?

scudette commented 7 years ago

There are issues for memory acquisition with AWS. See this for a thorough analysis:

https://lists.sans.org/mailman/private/dfir/2016-August/037716.html

On 24/03/2017, Andreas Moser notifications@github.com wrote:

Hm seems like a Rekall issue, could totally be related to running on AWS. @scudette might know more about running Rekall in this environment?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/google/grr/issues/484#issuecomment-288971177

vegardvaage commented 7 years ago

@scudette, thanks. This may be a better fit for a rekall issue, but I see that LiME has had similar problems as well, but it looks like they found a way to fix them, see https://github.com/504ensicsLabs/LiME/issues/16 . I managed to get a successful memory capture using a LiME kernel module using the most recent version. Maybe there's a way to adapt or learn from the LiME fix?

grrrrrrrrr commented 7 years ago

This is also a pure Rekall issue, I'll close this on the GRR side. @vegardvaage if this is still not working for you, please file an issue with Rekall instead, thanks!