zalf-rdm / geonode-k8s

A Kubernetes helm chart for the geospacial webapplication Geonode
https://geonode-k8s.readthedocs.io/en/latest/
GNU General Public License v2.0
13 stars 8 forks source link

Bug: geonode container not responding #96

Open mwallschlaeger opened 1 year ago

mwallschlaeger commented 1 year ago

Bug Description

When running geonode-k8s on a cluster for some days. I noticed that the container isn't responding to web traffic anymore. The logs just reinitializing workers all the time:

worker 10 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1324)
Respawned uWSGI worker 9 (new pid: 1325)
worker 10 lifetime reached, it was running for 3601 second(s)
worker 9 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1331)
Respawned uWSGI worker 9 (new pid: 1332)
worker 9 lifetime reached, it was running for 3601 second(s)
worker 10 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1347)
Respawned uWSGI worker 9 (new pid: 1348)
worker 9 lifetime reached, it was running for 3601 second(s)
worker 10 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1354)
Respawned uWSGI worker 9 (new pid: 1355)
worker 9 lifetime reached, it was running for 3601 second(s)
worker 10 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1361)
Respawned uWSGI worker 9 (new pid: 1362)
worker 9 lifetime reached, it was running for 3601 second(s)
worker 10 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1400)
Respawned uWSGI worker 9 (new pid: 1401)
worker 10 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1413)
worker 9 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 9 (new pid: 1415)
worker 10 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1420)
worker 9 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 9 (new pid: 1422)
worker 10 lifetime reached, it was running for 3601 second(s)
worker 9 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1428)
Respawned uWSGI worker 9 (new pid: 1429)
worker 10 lifetime reached, it was running for 3601 second(s)
worker 9 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1435)
Respawned uWSGI worker 9 (new pid: 1436)
worker 9 lifetime reached, it was running for 3601 second(s)
worker 10 lifetime reached, it was running for 3601 second(s)
Respawned uWSGI worker 10 (new pid: 1442)
Respawned uWSGI worker 9 (new pid: 1443)
worker 9 lifetime reached, it was running for 3601 second(s)
worker 10 lifetime reached, it was running for 3601 second(s)

Maye it's useful here to increase the .Values.geonode.uswgi.max_worker_lifetime default to something higher that 3600 seconds. Also it would be good to have Liveness probes for all the geonode service.

Reproduction Steps

best to provide your values.yaml, a brief description of your cluster and version of geonode-k8s here.

Behavior

A description of what you expected to happen and what actually happened.

Additional Information

Any additional information or context that may be helpful in resolving the bug.

mwallschlaeger commented 7 months ago

I cannot reproduce this issue atm. Therefore I will remove this issue from release 1.1.0