mattermost / mattermost-helm

Mattermost Helm charts for Kubernetes
Apache License 2.0
162 stars 145 forks source link

Mattermost unstable for the past 2 weeks #401

Closed bernardgut closed 4 months ago

bernardgut commented 1 year ago

Hello

We are running mattermost-team as a small team of 3 people in our local kubernetes and are very happy with it. However, in the past 2 weeks the instance has become very unstable with many restarts and some downtime that sometime last hours at around the same time of the day. After investigating we found the following in the logs

{"timestamp":"2023-05-02 08:33:35.174 Z","level":"error","msg":"plugin process exited","caller":"plugin/hclog_adapter.go:79","plugin_id":"com.mattermost.apps","wrapped_extras":"pathplugins/com.mattermost.apps/server/dist/plugin-linux-amd64pid87errorsignal: killed"}
{"timestamp":"2023-05-02 08:33:36.979 Z","level":"error","msg":"Failed to install prepackaged plugin","caller":"app/plugin.go:967","path":"/mattermost/prepackaged_plugins/mattermost-plugin-apps-v1.2.0-linux-amd64.tar.gz","error":"Failed to install extracted prepackaged plugin /mattermost/prepackaged_plugins/mattermost-plugin-apps-v1.2.0-linux-amd64.tar.gz: installExtractedPlugin: Unable to restart plugin on upgrade., unable to start plugin: com.mattermost.apps: timeout while waiting for plugin to start"}

then the pod restarts. sometimes it works. sometimes it fails on restart for a few more cycles. Each times it takes up to 5 minute to restart and either fails or works (until it fails randomly again.)

Thank you.

clouedoc commented 12 months ago

It looks like something is getting killed; why?

If it's because of an installation timeout, this might be caused because of permissions issues. Maybe taking a look at #410 can be worth it? Otherwise, if you can get the signal that killed the process, it might be useful to you; is it the OS killing the pod? Maybe you are running into an OOM error

bernardgut commented 4 months ago

it was due to the hardware where the pod was running. The storage had issues. THanks. You can close this

clouedoc commented 4 months ago

Close it yourself, I don't have access, but you do since you created the issue ;)