kolide / launcher

Osquery launcher, autoupdater, and packager
https://kolide.com/launcher
Other
505 stars 100 forks source link

error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8 #445

Closed cyberonstian closed 1 year ago

cyberonstian commented 5 years ago

We are in the process of deploying Kolide to our customers and we get this error on multiple of the windows clients. When the error is logged launcher stops sending data to the server.

We have tried updating osqueryd (3.3.2) and launcher (9.0.2) to latest versions but the error still persists.

If we delete the osquery.db files and restart launcher it works for a while and then the error reappears and the log traffic stops.

We have done some investigation and nailed it down to this query: SELECT * FROM programs

This works perfectly in Kolide fleet dashboard, but when running it scheduled launcher fails to send the logs due to the error message. We have seen the error message on multiple computers but are still investigation what queries triggers those endpoints.

To nail it down a bit further we found the software in the list that had the characters that triggered the error: SELECT * FROM programs WHERE name = "HxD Hex Editor version 1.7.7.0"

What version of fleet are you using (fleet version --full)?

fleet - version 2.0.2 branch: master revision: 8ca0358bf28173685815b79d8683a4239d629a14 build date: 2019-01-18T00:39:40Z build user: zwass go version: go1.11.3¨

What operating system are you using?

Windows 10 Pro 1893

What did you do?

  1. Install Kolide Fleet agent on the client
  2. Start in debug mode
  3. Wait a couple of minutes, we see that the server is recieving data from the client
  4. The error appears in the logs and no data is recieved on the server.

    What did you expect to see?

    Data sent from launcher.exe to kolide fleet

    What did you see instead?

The two first log entries are the error message, the last one is the software that we think is causing them.

{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":3,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"0s","ts":"2019-03-12T12:34:15.8004054Z","uuid":"bde64b33-697d-4782-b8a2-866e0a44e71a"}

{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2019-03-12T12:34:15.8014026Z"}

{"caller":"publish_results.go:168","err":null,"errcode":"","message":"","method":"PublishResults","reauth":false,"results":"[{\"query_name\":\"kolide_distributed_query_352\",\"status\":0,\"rows\":[{\"identifying_number\":\"\",\"install_date\":\"20180625\",\"install_location\":\"C:\\Program Files (x86)\\HxD\\\",\"install_source\":\"\",\"language\":\"\",\"name\":\"HxD Hex Editor version 1.7.7.0\",\"publisher\":\"Ma�l H�rz\",\"uninstall_string\":\"\\"C:\\Program Files (x86)\\HxD\\unins000.exe\\"\",\"version\":\"1.7.7.0\"}]}]","severity":"debug","took":"16.9527ms","ts":"2019-03-12T22:34:46.991793Z","uuid":"7729e05c-71a1-4f7b-bd37-8fb37e76e6db"}

groob commented 5 years ago

https://github.com/facebook/osquery/issues/5288

polak785 commented 4 years ago

Hello,

This bug still exists and block a normal usage. I'm on Kolide Fleet 2.5.0 on server side and Windows 10 1709 with Kolide Launcher 0.11.9 and osquery 4.2.0 on client side. This a test virtual machine, therefore there are few things installed on it.

I confirm that the bug happen when a query is done on table programs. Then, none query return result, Kolide Launcher can be restarted but is still on a bad state and the quick fix is to delete the local database of Kolide Launcher. But this will block again when the scheduled query is started.

I think the bug is on Kolide Launcher side because when I run osquery with exact same parameters as Kolide Launcher starts it, it works like a charm, osqueryd return results and keep continue to work.

Here is the command I run for osqueryd:

"C:\Program Files\Kolide\Launcher-launcher\bin\osqueryd.exe-updates\1583771796\osqueryd" --tls_hostname=xxx.yyy.zzz:8080 --enroll_secret_path="C:\Program Files\Kolide\Launcher-launcher\conf\secret" --enroll_tls_endpoint=/api/v1/osquery/enroll --config_plugin=tls --config_tls_endpoint=/api/v1/osquery/config --config_refresh=10 --host_identifier=uuid --pidfile="C:\Program Files\Kolide\Launcher-launcher\data\osquery.db\osquery.pid" --database_path="C:\Program Files\Kolide\Launcher-launcher\data\osquery.db\osquery.db" --extensions_socket="C:\Program Files\Kolide\Launcher-launcher\data\osquery.sock" --extensions_timeout=10 -logger_plugin=tls --distributed_plugin=tls --disable_distributed=false --distributed_interval=5 --pack_delimiter=: --host_identifier=uuid --force=true --disable_watchdog --utc --verbose --config_refresh=300 --config_accelerated_refresh=30 --extensions_autoload="C:\Program Files\Kolide\Launcher-launcher\data\osquery.autoload"

Can you fix this bug please? I can do more tests if required.

directionless commented 4 years ago

I think the encoding bug is in osquery, as discussed in https://github.com/osquery/osquery/issues/5288 but I agree that launcher shouldn't crash.

How are you issuing the that query? Is it a scheduled query, or being sent as a distributed query?

polak785 commented 4 years ago

This is a scheduled query with logging as snapshot.

Yes this is very blocking that Launcher crash because the only fix is to delete local database.

directionless commented 4 years ago

I have not yet spun up a test environment for this, but I don't think we see this on the SaaS side of things. I'm wondering if launcher is still up, but it's stuck in a loop trying to send the same log to the server. Do the launcher logs indicate?

polak785 commented 4 years ago

Yes the same log occur every minute:

{"caller":"publish_logs.go:157","err":null,"errcode":"","logType":"status","log_count":9,"message":"","method":"PublishLogs","reauth":false,"severity":"debug","took":"34.045ms","ts":"2020-03-19T09:45:29.3740801Z","uuid":"bf28a7df-0748-4076-8c59-09d300b3b8e0"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"16.3766ms","ts":"2020-03-19T09:45:29.4973711Z","uuid":"7196d32e-f1a0-4d82-8b6f-2be5306376bb"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:45:29.5006111Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.0947ms","ts":"2020-03-19T09:45:37.0671645Z","uuid":"9c8f5303-2baa-4271-81d9-25321910f906"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"31.2682ms","ts":"2020-03-19T09:45:47.4014405Z","uuid":"d8ffa71b-6486-41a8-8f2c-d87f4013533a"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.0556ms","ts":"2020-03-19T09:45:56.6949539Z","uuid":"299d8b07-3360-47f1-bdd4-b938f0a006d6"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.7393ms","ts":"2020-03-19T09:46:05.0590382Z","uuid":"4ebf9656-9878-4d1f-96e1-285985d5bfba"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"34.1126ms","ts":"2020-03-19T09:46:12.8515392Z","uuid":"77db8c33-5e0c-4890-8d9f-d41895f6e5a1"}
{"caller":"query_target.go:71","msg":"server does not implement GetTargets","severity":"debug","ts":"2020-03-19T09:46:21.3208714Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"31.79ms","ts":"2020-03-19T09:46:21.4265874Z","uuid":"2fdb6afb-eb1b-499a-b586-f1e938e43d28"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"10.1363ms","ts":"2020-03-19T09:46:24.2113109Z","uuid":"40f6d97a-1b81-4074-8343-71a5623b46be"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:46:24.2203495Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.8889ms","ts":"2020-03-19T09:46:30.8535053Z","uuid":"74066323-9566-4e30-9d56-be3c7d9fc525"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"32.7023ms","ts":"2020-03-19T09:46:41.2258338Z","uuid":"c90a6fbf-ea71-4993-aa9f-6a5f6b006803"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.8766ms","ts":"2020-03-19T09:46:52.6475113Z","uuid":"402dd069-77f5-4034-9a04-98aec1f56c45"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.3889ms","ts":"2020-03-19T09:47:05.2297429Z","uuid":"75efbae4-4699-483e-93a0-f335fee6cb41"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.0314ms","ts":"2020-03-19T09:47:18.5264584Z","uuid":"7e8c8d50-3f12-4023-985c-c830aa435a1b"}
{"caller":"query_target.go:71","msg":"server does not implement GetTargets","severity":"debug","ts":"2020-03-19T09:47:30.0738021Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"46.5929ms","ts":"2020-03-19T09:47:30.5033555Z","uuid":"9633aa80-b003-4651-9478-d5d6014ada96"}
{"caller":"publish_logs.go:157","err":"rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","errcode":"","logType":"string","log_count":4530,"message":"","method":"PublishLogs","reauth":false,"severity":"info","took":"8.4364ms","ts":"2020-03-19T09:47:33.4980049Z","uuid":"d505fbb3-7326-4cd5-8c3b-0e8105407cf5"}
{"caller":"extension.go:494","err":"sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8","severity":"info","ts":"2020-03-19T09:47:33.5020486Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"29.5181ms","ts":"2020-03-19T09:47:41.2621737Z","uuid":"febf6a67-e727-48bd-b5de-6abafe0b03e0"}
{"caller":"scheduler.cpp:96","component":"osquery","level":"stderr","msg":"I0319 10:47:49.286285  7356 scheduler.cpp:96] Executing scheduled query pack/Software/Installed software list - Windows: select name as software_name, version as software_version from programs;","severity":"debug","ts":"2020-03-19T09:47:49.2939863Z"}
{"caller":"process_ops.cpp:99","component":"osquery","level":"stderr","msg":"I0319 10:47:49.298260  7356 process_ops.cpp:99] No account name provided","severity":"info","ts":"2020-03-19T09:47:49.3016493Z"}
{"caller":"logged_in_users.cpp:129","component":"osquery","level":"stderr","msg":"I0319 10:47:49.298260  7356 logged_in_users.cpp:129] Error converting username to SID","severity":"info","ts":"2020-03-19T09:47:49.3025674Z"}
{"caller":"request_queries.go:133","err":null,"method":"RequestQueries","reauth":false,"res":"{\"queries\":{}}","severity":"debug","took":"30.894ms","ts":"2020-03-19T09:47:50.9465303Z","uuid":"aaf33ff4-6b2d-42fc-bda8-449650f93766"}
directionless commented 4 years ago

I would say that indicates it's not a launcher bug. Launcher is sending a log, and fleet is rejecting it. Launcher is then resending it. The reason you need to drop the database, is because launcher will always try to resend stored logs.

I am not immediately sure whether I think launcher should drop the log when the server rejects it. It's the sort of thing that would cause logs to be dropped, and potentially break any kind of monitoring.

My gut sense is that fleet should accept these and either fix them, and record the data, or discard them. @zwass what do you think here?

polak785 commented 4 years ago

As reference, the original issue has been created on fleet : https://github.com/kolide/fleet/issues/2014

But as with osquery this works fine, I think a fix should be done on Kolide side to accept the log and not discard it.

zwass commented 4 years ago

This does not look like an error on the Fleet side. Notice the error message:

rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8

There wouldn't be any marshalling on the Fleet side, only unmarshalling.

directionless commented 4 years ago

rpc error: code = Internal desc = grpc: error while marshaling: proto: field "kolide.agent.LogCollection.Log.Data" contains invalid UTF-8

Hrm. I'll take a closer look. @zwass What do you think launcher should do? We talked about it briefly in https://github.com/kolide/launcher/pull/481

zwass commented 4 years ago

Either escape it in some way, or drop the invalid characters I suppose. Probably best to do this only after receiving the error so we don't add an extra encoding/scanning step for every single log.

wkleinhenz commented 4 years ago

I seem to be experience this error based on logs im seeing from the service on windows host, has a workaround or solution been determined yet, im running the most recent version of launcher (0.11.9) and fleet version 2.6.0

polak785 commented 4 years ago

For my concern, the bug has been fixed with latest osquery release (4.3.0).

The related PR is https://github.com/osquery/osquery/pull/6190

All encoding issues should be fixed when the PR https://github.com/osquery/osquery/pull/6338 will be released

B3DTech commented 3 years ago

Is there any progress on this? 4.5.1 and all my event logs across all my Windows servers are seeing thousands of:

caller=log.go:124 ts=2021-01-19T16:50:26.127444Z caller=level.go:63 level=info caller=extension.go:494 err="sending string logs: writing logs: transport error sending logs: rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8"

caller=log.go:124 ts=2021-01-19T16:50:26.1264442Z caller=level.go:63 level=info caller=publish_logs.go:179 method=PublishLogs uuid=6e2ccedd-1f13-4ea9-9bcc-37d9611e3eff logType=string log_count=1103 message= errcode= reauth=false err="rpc error: code = Internal desc = grpc: error while marshaling: proto: field \"kolide.agent.LogCollection.Log.Data\" contains invalid UTF-8" took=14.0018ms

zwass commented 3 years ago

@B3DTech FWIW we (github.com/fleetdm) are working on an osquery autoupdater that takes a different approach by letting osquery do the heavy lifting. This (I believe) will help resolve issues like this one with the grpc transport in Launcher.

directionless commented 3 years ago

@B3DTech The underlying issue here, is that osquery sometimes produces data that is not utf-8 compatible. This is generally believed to be a bug. What do you think launcher should do when that happens?

B3DTech commented 3 years ago

My problem is that it isn't "sometimes". It's my event logs on every machine are getting flooded with 2-4 of these message every minute. I'm not sure what should happen because I don't know why it's happening or what it's choking on. Osquery shouldn't be failing to produce UTF-8 messages.

And additionally, my 4.5.1 clients don't seem to be sending event logs, but my 4.2 clients are.

directionless commented 3 years ago

Regardless of whether osquery "should" produce non-utf8, it does. So what do you think launcher should do? Attempt to repair the data? Drop it on the floor? Encode it?

directionless commented 3 years ago

I've updated #481. It should now attempt to repair the data, then redact the non-utf8 characters. I can't easily test it though

B3DTech commented 3 years ago

It was determined that there is mis-formatted osquery results in the osquery store from the previous version of osquery, and Launcher is still trying to send that. Removing the C:\Program Files\Kolide\Launcher-so-launcher\data\ directory and restarting the service fixed the issue - no more UTF8 events.

directionless commented 1 year ago

I think this is as resolved as it's getting