Open alexandreLamarre opened 1 year ago
Hit something similar again with gateway logs:
2023-05-18T17:06:16Z INFO util/config.go:31 using config file {"path": "/etc/opni/config.yaml"}
2023-05-18T17:06:16.610Z INFO feature-flags featureflags@v0.0.0-20220803034705-b6242a8d72b2/featurelist.go:140 starting configmap watch collector
I0518 17:06:16.610185 1 shared_informer.go:273] Waiting for caches to sync for feature-flags
I0518 17:06:16.710617 1 shared_informer.go:280] Caches are synced for feature-flags
2023-05-18T17:06:16Z INFO commands/gateway.go:98 loading plugins {"dir": "/var/lib/opni/plugins"}
2023-05-18T17:06:16Z DEBUG keyring machinery/keys.go:53 loaded ephemeral key {"path": "/run/opni/keyring/session-attribute.json", "usage": "auth", "labels": {"opni.io/session-attribute":"local"}}
2023-05-18T17:06:16Z DEBUG gateway plugins/discovery.go:88 plugin ignored due to filter {"plugin": "/var/lib/opni/plugins/plugin_aiops", "filter": 0}
2023-05-18T17:06:16Z DEBUG gateway plugins/discovery.go:88 plugin ignored due to filter {"plugin": "/var/lib/opni/plugins/plugin_alerting", "filter": 0}
2023-05-18T17:06:16Z INFO openid openid/openid.go:174 successfully fetched openid configuration {"issuer": "https://dev-i6kk5uiq3ldbjgtf.us.auth0.com/"}
2023-05-18T17:06:17Z ERROR gateway plugins/discovery.go:72 failed to query plugin modes {"error": "invalid character 'c' looking for beginning of value", "plugin": "/var/lib/opni/plugins/plugin_metrics"}
2023-05-18T17:06:17Z DEBUG gateway plugins/discovery.go:88 plugin ignored due to filter {"plugin": "/var/lib/opni/plugins/plugin_slo", "filter": 0}
2023-05-18T17:06:17Z DEBUG gateway patch/manifest.go:97 found 2 plugins
2023-05-18T17:06:17Z DEBUG gateway patch/manifest.go:153 loaded plugin manifest {"plugins": 2}
2023-05-18T17:06:17Z INFO gateway.cache patch/filesystem.go:57 compressing and archiving plugins...
2023-05-18T17:06:19Z DEBUG gateway.cache patch/filesystem.go:112 added 2 new plugins to cache
2023-05-18T17:06:19Z INFO gateway patch/server.go:125 running plugin cache gc
2023-05-18T17:06:19Z INFO gateway.cache patch/filesystem.go:244 cleaned 4 unreachable objects
2023-05-18T17:06:19Z DEBUG gateway.grpc gateway/stream.go:190 registering service {"service": "control.HealthListener"}
2023-05-18T17:06:19Z DEBUG gateway.grpc gateway/stream.go:202 registering internal service {"service": "stream.Delegate"}
2023-05-18T17:06:19Z INFO gateway.pluginloader plugins/loader.go:150 loading plugin {"plugin": "github.com/rancher/opni/plugins/aiops"}
2023-05-18T17:06:19Z INFO gateway.pluginloader plugins/loader.go:150 loading plugin {"plugin": "github.com/rancher/opni/plugins/alerting"}
2023-05-18T17:06:19Z INFO gateway.pluginloader plugins/loader.go:150 loading plugin {"plugin": "github.com/rancher/opni/plugins/logging"}
2023-05-18T17:06:19Z INFO gateway.pluginloader plugins/loader.go:150 loading plugin {"plugin": "github.com/rancher/opni/plugins/metrics"}
2023-05-18T17:06:19Z INFO gateway.pluginloader plugins/loader.go:150 loading plugin {"plugin": "github.com/rancher/opni/plugins/topology"}
2023-05-18T17:06:19Z INFO gateway.pluginloader plugins/loader.go:150 loading plugin {"plugin": "github.com/rancher/opni/plugins/slo"}
2023-05-18T17:06:20Z ERROR gateway.pluginloader plugins/loader.go:157 failed to load plugin {"plugin": "github.com/rancher/opni/plugins/metrics", "error": "Unrecognized remote plugin message: code change\nThis usually means\n the plugin was not compiled for this architecture,\n the plugin is missing dynamic-link libraries necessary to run,\n the plugin is not executable by this process due to file permissions, or\n the plugin failed to negotiate the initial go-plugin protocol handshake\n\nAdditional notes about plugin:\n Path: /var/lib/opni/plugins/plugin_metrics\n Mode: -rwxr-xr-x\n Owner: 0 [root] (current: 0 [root])\n Group: 0 [root] (current: 0 [root])\n ELF architecture: EM_X86_64 (current architecture: amd64)\n"}
2023-05-18T17:06:20.328Z [ERROR] plugin process exited: path=/var/lib/opni/plugins/plugin_metrics pid=48 error="signal: killed"
agent logs:
2023-05-18T17:06:23Z INFO commands/agent_v2.go:65 using config file {"path": "/etc/opni/config.yaml"}
2023-05-18T17:06:23Z DEBUG agent v2/agent.go:127 using log level: debug
2023-05-18T17:06:23Z INFO agent v2/agent.go:213 loaded existing keyring
2023-05-18T17:06:23Z DEBUG keyring machinery/keys.go:53 loaded ephemeral key {"path": "/run/opni-agent/keyring/session-attribute.json", "usage": "auth", "labels": {"opni.io/session-attribute":"local"}}
2023-05-18T17:06:23Z INFO agent v2/agent.go:500 attempting to sync plugins with gateway
2023-05-18T17:06:23Z DEBUG agent patch/manifest.go:97 found 2 plugins
2023-05-18T17:06:23Z DEBUG agent patch/manifest.go:153 loaded plugin manifest {"plugins": 2}
2023-05-18T17:07:53Z INFO agent v2/agent.go:516 received patch manifest from gateway
2023-05-18T17:07:53Z INFO agent patch/client.go:289 updating plugin {"filename": "plugin_topology", "size": 238, "from": "9b323300a7317d0620634dfb0cbad259d1ec85025b7859d66e7bc91c40f42737", "to": "baa3156e51b209782ccde14f7358a5c29d7615e5d549ed5a7e86269c119bb78d"}
2023-05-18T17:07:53Z INFO agent patch/client.go:289 updating plugin {"filename": "plugin_logging", "size": 251, "from": "f4c743c23880f99797c4ef7b1fa13235a3508cd0b6c9194f69db8449c0aca00b", "to": "f2cb75326b150ef9ca3da66b181ed184b27b625ffe13bca4aba6850c1b043d87"}
2023-05-18T17:07:55Z INFO agent.pluginloader plugins/loader.go:150 loading plugin {"plugin": "github.com/rancher/opni/plugins/logging"}
2023-05-18T17:07:55Z INFO agent.pluginloader plugins/loader.go:150 loading plugin {"plugin": "github.com/rancher/opni/plugins/topology"}
@kralicky @dbason
Hitting this again, with the new upgrader.
The only thing that resolved the issue was setting the manager image to a new image.
2023-07-05T21:08:45Z INFO agent v2/agent.go:528 attempting to sync plugins with gateway
2023-07-05T21:08:45Z DEBUG agent.plugin-upgrader patch/manifest.go:99 found 3 plugins
2023-07-05T21:08:46Z DEBUG agent.plugin-upgrader patch/manifest.go:156 loaded plugin manifest {"plugins": 3}
2023-07-05T21:08:56Z INFO agent v2/agent.go:581 received patch manifest from gateway
2023-07-05T21:08:56Z INFO agent.plugin-upgrader client/client.go:307 writing new plugin {"path": "plugin_logging", "size": 61005824}
2023-07-05T21:08:56Z INFO agent.plugin-upgrader client/client.go:307 writing new plugin {"path": "plugin_example", "size": 23875584}
2023-07-05T21:08:56Z INFO agent.plugin-upgrader client/client.go:307 writing new plugin {"path": "plugin_metrics", "size": 68370432}
2023-07-05T21:08:56Z INFO agent.plugin-upgrader client/client.go:307 writing new plugin {"path": "plugin_topology", "size": 63721472}
2023-07-05T21:08:56Z WARN agent.pluginloader plugins/loader.go:287 plugin is not present in manifest, skipping {"module": "github.com/rancher/opni/plugins/example", "path": "/var/lib/opni-agent/plugins/plugin_example"}
2023-07-05T21:08:56Z WARN agent.pluginloader plugins/loader.go:287 plugin is not present in manifest, skipping {"module": "github.com/rancher/opni/plugins/logging", "path": "/var/lib/opni-agent/plugins/plugin_logging"}
2023-07-05T21:08:56Z WARN agent.pluginloader plugins/loader.go:287 plugin is not present in manifest, skipping {"module": "github.com/rancher/opni/plugins/metrics", "path": "/var/lib/opni-agent/plugins/plugin_metrics"}
2023-07-05T21:08:56Z WARN agent.pluginloader plugins/loader.go:287 plugin is not present in manifest, skipping {"module": "github.com/rancher/opni/plugins/topology", "path": "/var/lib/opni-agent/plugins/plugin_topology"}
2023-07-05T21:08:56Z INFO agent v2/agent.go:391 loaded 0 plugins
2023-07-05T21:08:56Z INFO agent v2/agent.go:414 agent http server starting {"address": "0.0.0.0:8080"}
2023-07-05T21:08:56Z INFO agent v2/agent.go:481 connecting to gateway...
Gateway :
2023-07-05T23:32:35Z INFO gateway.cache patch/filesystem.go:153 generating patch {"from": "8ae1e63d10a084cbea1c90ba8a547aed0fb69bb29982adb337432b0b6f9b3ae7", "to": "26867ba2154bb003ac551f0ec8d2a9f18a7a544a9833fa3713af35af3f8d6d87"}
2023-07-05T23:32:35Z ERROR gateway.cache patch/filesystem.go:159 failed to generate patch {"from": "8ae1e63d10a084cbea1c90ba8a547aed0fb69bb29982adb337432b0b6f9b3ae7", "to": "26867ba2154bb003ac551f0ec8d2a9f18a7a544a9833fa3713af35af3f8d6d87", "error": "open /var/lib/opni/plugin-cache/plugins/8ae1e63d10a084cbea1c90ba8a547aed0fb69bb29982adb337432b0b6f9b3ae7: no such file or directory"}
2023-07-05T23:32:35Z INFO gateway.cache patch/filesystem.go:153 generating patch {"from": "8d31c814d5e22555aaf0bca82bc430676e4bd7777d6ac8f8c789a86a2b217221", "to": "aaeba35612c309ea544a17b882d23c91091aeff040339d37946b07b5606d21ed"}
2023-07-05T23:32:35Z INFO gateway.cache patch/filesystem.go:153 generating patch {"from": "49b7917bce84c1e69ce06516029556d0cb7a2adea71dcea3cc4b4bfee0bc7256", "to": "f43ee90910df9a9af63cec96ed0c2fc17683e5bd677ccb317116b5e5bc54cd38"}
2023-07-05T23:32:35Z ERROR gateway.cache patch/filesystem.go:159 failed to generate patch {"from": "49b7917bce84c1e69ce06516029556d0cb7a2adea71dcea3cc4b4bfee0bc7256", "to": "f43ee90910df9a9af63cec96ed0c2fc17683e5bd677ccb317116b5e5bc54cd38", "error": "open /var/lib/opni/plugin-cache/plugins/49b7917bce84c1e69ce06516029556d0cb7a2adea71dcea3cc4b4bfee0bc7256: no such file or directory"}
2023-07-05T23:32:35Z ERROR gateway.cache patch/filesystem.go:159 failed to generate patch {"from": "8d31c814d5e22555aaf0bca82bc430676e4bd7777d6ac8f8c789a86a2b217221", "to": "aaeba35612c309ea544a17b882d23c91091aeff040339d37946b07b5606d21ed", "error": "open /var/lib/opni/plugin-cache/plugins/8d31c814d5e22555aaf0bca82bc430676e4bd7777d6ac8f8c789a86a2b217221: no such file or directory"}
Restarting the gateway with a new image resolved the gateway issues, but the agent still complains about missing plugins in manifest
2023-07-05T23:56:02Z WARN agent.pluginloader plugins/loader.go:287 plugin is not present in manifest, skipping {"module": "github.com/rancher/opni/plugins/example", "path": "/var/lib/opni-agent/plugins/plugin_example"}
2023-07-05T23:56:02Z WARN agent.pluginloader plugins/loader.go:287 plugin is not present in manifest, skipping {"module": "github.com/rancher/opni/plugins/logging", "path": "/var/lib/opni-agent/plugins/plugin_logging"}
2023-07-05T23:56:02Z WARN agent.pluginloader plugins/loader.go:287 plugin is not present in manifest, skipping {"module": "github.com/rancher/opni/plugins/metrics", "path": "/var/lib/opni-agent/plugins/plugin_metrics"}
2023-07-05T23:56:02Z WARN agent.pluginloader plugins/loader.go:287 plugin is not present in manifest, skipping {"module": "github.com/rancher/opni/plugins/topology", "path": "/var/lib/opni-agent/plugins/plugin_topology"
values.yaml
gateway:
hostname: opni.test.gateway.alexdev.app
storageType : jetstream
auth:
provider: openid
openid:
discovery:
# secret
kube-prometheus-stack:
enabled : true
@alexandreLamarre please take a look at test coverage in pkg/update and check for any possible areas where we are not handling error scenarios
Something caused a gateway patch sent to an agent to delete the metrics and logging plugins on the agent which (obviously) results in no logs or metrics being sent