google / grr

GRR Rapid Response: remote live forensics for incident response
https://grr-doc.readthedocs.io/
Apache License 2.0
4.69k stars 760 forks source link

Clients running 3102 hang post server upgrade to 3.2.0.1 #551

Closed soundwave01 closed 6 years ago

soundwave01 commented 6 years ago

Hi,

I'm currently testing upgrading our GRR setup (9.5k windows endpoints) to 3.2.0.1. But I have hit bit of problem.

In my test environment, having upgraded the server, leaving the clients still running 3.1.0.2. At some point (ie not directly after the upgrade, but normally within a hour) the clients will hang.

  1. There are no errors on the server
  2. Running the debug client with --verbose and --debug, gives no errors. Output just stops.
  3. I am able to recreate the issue (rebuilding my test environment), but it's not clear exactly what triggers it.
  4. While running the debug version, any input into the shell, will bring the client back into life. Almost as if it's waiting on a user input. Of course when it's running as a service, I cant do that.

Any suggestions on how to workout what is causing the hang would be greatly appreciated! Sorry it's not alot to go on.

While I plan to upgrade the clients to 3.2.0.1,it will take a while, so I really need to try to get the new version working with the old clients to avoid having the most of the setup down for several weeks while we roll out the new agent.

grrrrrrrrr commented 6 years ago

Hm interesting problem. Can you maybe show the output where the clients hang? Do they hang on all OSs or is this Windows only?

soundwave01 commented 6 years ago

output from the client

INFO:2017-11-27 02:29:13,658 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0160000324249 sec. Sleeping for 5.0 INFO:2017-11-27 02:29:18,743 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0 sec. Sleeping for 5.0 INFO:2017-11-27 02:29:23,829 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0 sec. Sleeping for 5.0 INFO:2017-11-27 03:05:46,148 comms:792] Sending back client statistics to the server. INFO:2017-11-27 03:18:32,187 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 2(3883), Received 3 messages in 0.0629999637604 sec. Sleeping for 0.2 INFO:2017-11-27 03:18:32,404 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0 sec. Sleeping for 0.23 INFO:2017-11-27 03:18:32,654 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0 sec. Sleep

note the jump in time stamps from 02:29:23 -> 03:05:46. 03:05:46 is when I hit enter on the keyboard and it came back to life.

I have only tested this on windows, as we only have it deployed on our windows estate. The agents were running 64bit windows 7.

soundwave01 commented 6 years ago

The hang repeated it's self a couple of hours later. looks like it's in the same place

INFO:2017-11-27 05:13:17,128 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0310001373291 sec. Sleeping for 5.0 INFO:2017-11-27 05:13:22,229 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0 sec. Sleeping for 5.0 INFO:2017-11-27 05:41:15,515 comms:792] Sending back client statistics to the server. INFO:2017-11-27 05:41:18,604 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 2(3931), Received 0 messages in 0.047000169754 sec. Sleeping for 5.0 INFO:2017-11-27 05:41:23,690 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0 sec. Sleeping for 5.0 INFO:2017-11-27 05:41:28,775 comms:1338] aff4:/C.c2f53d3d06d0045c: Sending 0(634), Received 0 messages in 0.0 sec. Sleeping for 5.0

again note the jump in times in the log messages 05:13:22 -> 05:41:15

grrrrrrrrr commented 6 years ago

Hm I have absolutely no idea what could be going on here. I also can't really reproduce this, the clients are just too old. The only thing I could think of would be a passphrase encrypted key. There is some logic to not ask for a passphrase in the client but maybe that goes wrong. Do you use any special key setup?

Regardless, the way forward for you would be to upgrade the clients and make this problem disappear.

soundwave01 commented 6 years ago

Thank you for looking, ill push ahead with the upgrade, and accept a small amount of disruption.