google / grr

GRR Rapid Response: remote live forensics for incident response
https://grr-doc.readthedocs.io/
Apache License 2.0
4.75k stars 761 forks source link

Interrogate Job Fails #701

Open 4ndygu opened 5 years ago

4ndygu commented 5 years ago

Hi there! Currently, the Interrogate job fails for one of our ubuntu bionic machines with the following bug:

raise rdfvalue.DecodeError("Too many bytes when decoding varint.") DecodeError: Too many bytes when decoding varint.

Has anyone seen this issue before?

4ndygu commented 5 years ago

More:

Traceback (most recent call last): File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_server/flow_runner.py", line 568, in RunStateMethod method(responses) File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_server/flows/general/discovery.py", line 486, in ClientConfiguration for k, v in iteritems(response): File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_core/lib/rdfvalues/protodict.py", line 244, in Items yield x.k.GetValue(), x.v.GetValue() File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 2101, in <lambda> property(lambda self: self.Get(field_desc.name), lambda self, x: self. File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 1863, in Get wire_format, container=self) File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 889, in ConvertFromWireFormat ReadIntoObject(value[2], 0, result) File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 234, in ReadIntoObject buff, index=index, length=length): File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 185, in SplitBuffer _, new_index = VarintReader(buff, data_index) File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 143, in VarintReader raise rdfvalue.DecodeError("Too many bytes when decoding varint.") DecodeError: Too many bytes when decoding varint.

mbushkov commented 5 years ago

Hi Andy! Thanks for reporting this. Can you please share the exact system version (ideally "uname -a" output), and also: GRR client version, GRR server version and how was GRR server installed (from DEB, from PIP, etc). Thanks!

4ndygu commented 5 years ago

yep!

output: Linux hostname 4.15.0-1030-gcp #32-Ubuntu SMP Wed Apr 10 10:27:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

GRR client version: 3.2.4.7 GRR server version: 3.2.4.7 Installed from: the grrdocker/grr image

side note: running /usr/sbin/grrd --version seems to break if you don't specify a config. i feel like it might be useful to just dump the version + quit before doing those checks.

mbushkov commented 5 years ago

You installed from "grrdocker/grr:latest", right?

Do you think you can add a logging statement to the server on this line (line 485, right before the loop)? https://github.com/google/grr/blob/78dce1d2738f516be520281184b100c7661a8643/grr/server/grr_response_server/flows/general/discovery.py#L485

Something like:

import logging
logging.info("[PROTO] " + repr(response.SerializeToString()))

And then see what pops up in GRR worker logs when you run the interrogate on the failing client and paste it here?

clairmont32 commented 5 years ago

Received this for a non-docker release for 3.3.0.0-

cat /usr/share/grr-server/lib/python2.7/site-packages/grr_response_core/var/log/grr-ui.log | grep Cron

ERROR:2019-05-29 13:57:50,701 7005 MainProcess 139654054278912 Thread-1826 http_api:573] Error while processing /api/cron-jobs/InterrogateClientsCronJob/actions/force-run (POST) with ApiForceRunCronJobHandler: CronJob with id InterrogateClientsCronJob not found. aff4_cronjobs.GetCronManager().RequestForcedRun(job_id) data_store.REL_DB.UpdateCronJob(job_id, forced_run_requested=True) File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_server/databases/db.py", line 3341, in UpdateCronJob File "/usr/share/grr-server/local/lib/python2.7/site-packages/grr_response_server/databases/mysql_cronjobs.py", line 136, in UpdateCronJob raise db.UnknownCronJobError("CronJob with id %s not found." % cronjob_id) UnknownCronJobError: CronJob with id InterrogateClientsCronJob not found.

mbushkov commented 5 years ago

@clairmont32 - this error is something that has to be triggered by a GRR UI user selecting the "InterrogateClientsCronJob" and clicking on "Force run" button. I tried doing this on a fresh install and got no errors. Is this reproducible? Do you see the same error reported in the UI?

clairmont32 commented 5 years ago

@mbushkov we did a fresh install of it last Wednesday by the deb package and this error was indeed popping in when I did force run it via GUI.

ryushi32 commented 4 years ago

@4ndygu Did you ever figure out a solution to this problem?

I'm running into the same issue.

The error happens on the two clients ive tested on centos7 and macos 10.15.5

Traceback (most recent call last): File "/usr/share/grr-server/lib/python3.6/site-packages/grr_response_server/flow_base.py", line 672, in RunStateMethod self.Error(responses) File "/usr/share/grr-server/lib/python3.6/site-packages/grr_response_server/flows/general/discovery.py", line 299, in ClientConfiguration for k, v in response.items(): File "/usr/share/grr-server/lib/python3.6/site-packages/grr_response_core/lib/rdfvalues/protodict.py", line 239, in Items yield x.k.GetValue(), x.v.GetValue() File "/usr/share/grr-server/lib/python3.6/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 2264, in property(lambda self: self.Get(field_desc.name), File "/usr/share/grr-server/lib/python3.6/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 2015, in Get wire_format, container=self) File "/usr/share/grr-server/lib/python3.6/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 993, in ConvertFromWireFormat ReadIntoObject(value[2], 0, result) File "/usr/share/grr-server/lib/python3.6/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 271, in ReadIntoObject buff, index=index, length=length): File "/usr/share/grr-server/lib/python3.6/site-packages/grr_responsecore/lib/rdfvalues/structs.py", line 192, in SplitBuffer , new_index = VarintReader(buff, data_index) File "/usr/share/grr-server/lib/python3.6/site-packages/grr_response_core/lib/rdfvalues/structs.py", line 150, in VarintReader raise rdfvalue.DecodeError("Too many bytes when decoding varint.") grr_response_core.lib.rdfvalue.DecodeError: Too many bytes when decoding varint.

mbushkov commented 4 years ago

@ryushi32 - what server and client versions do you have? How was the server installed (DEB package, Docker, etc)?

ryushi32 commented 4 years ago

I'm running the latest docker container grrdocker/grr:v3.4.2.0 with the latest client I found the problem. I was using terraform to sign the frontend certificate so the cert ended up with a serial of dc296b68aae5730c1ec72b47d3a8a9cf

The client pulls this serial number into its config and sends it back to the server on interrogate when parsing through the dictionary at

/usr/share/grr-server/lib/python3.6/site-packages/grr_response_server/flows/general/discovery.py", line 299, in ClientConfiguration for k, v in response.items()

it throws the error when it hits in the dictionary the Client.server_serial_number: dc296b68aae5730c1ec72b47d3a8a9cf

If i change the frontend certificate to 2 and reset the client config file it works after that.

This kind of does seem like a bug however. Grr should support larger serial numbers

mbushkov commented 4 years ago

@ryushi32 - thanks for the detailed analysis!

The situation with serial numbers is interesting. As a matter of fact, current GRR communication protocol implementation expects certs serial numbers to be monotonically increasing: https://github.com/google/grr/blob/e83001cfe56896854074b419427d3de9a78feec9/grr/client/grr_response_client/comms.py#L1377

Every client effectively checks, when talking to the server, that server cert's serial number is always greater than the last remembered one. This allows rolling out new certificates and revoking old ones easily.

This means that even if we fix the issue and support larger certificates serial numbers, generating certs with terraform will likely still be an issue, since serial numbers will be random instead of monotonically increasing.

Right now we're working on integrating Fleetspeak into GRR - an experimental Fleetspeak support was already part of the latest release. Fleetspeak is a next gen TLS and gRPC-based communication platform for GRR. It won't have the expectation for serial numbers to be monotonically increasing and will support arbitrary certs serial numbers.