excessive logging breaks testbed after ~2-3 days uptime

frosty-geek commented 2 years ago

I noticed that my "long term" testbeds (make ENVIRONMENT=pluscloudopen deploy-full) always break with / running out of space without any kind of user interaction.

fresh testbed deployment ~12h uptime

dragon@testbed-manager:~$ for h in testbed-node-{0..2}.testbed.osism.xyz; do ssh $h "hostname; df -h /; du -shx /var/log/kolla/"; done
testbed-node-0
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        49G   31G   18G  64% /
6.3G    /var/log/kolla/
testbed-node-1
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        49G   27G   22G  56% /
4.7G    /var/log/kolla/
testbed-node-2
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        49G   34G   15G  70% /
7.2G    /var/log/kolla/

dragon@testbed-node-0:~$ du -smx /var/log/kolla/* | sort -n | tail -3
19  /var/log/kolla/keystone
41  /var/log/kolla/nova
6286    /var/log/kolla/libvirt
dragon@testbed-node-1:~$ du -smx /var/log/kolla/* | sort -n | tail -3
17  /var/log/kolla/keystone
32  /var/log/kolla/nova
4702    /var/log/kolla/libvirt
dragon@testbed-node-2:~$ du -smx /var/log/kolla/* | sort -n | tail -3
43  /var/log/kolla/nova
93  /var/log/kolla/haproxy
7086    /var/log/kolla/libvirt

the reason why the nodes available disk is "asymmetric" because 2 octavia amphora are running on node-2 and 1 test vm is running on node-0

dragon@testbed-manager:/opt/configuration$ egrep -Ersi  openstack_logging_debug .
./environments/kolla/configuration.yml:openstack_logging_debug: "True"

what would be the best way to keep this from happening besides the obvious point to globally disable openstack debug logging? atm I tend to change this default for the testbed as it breaks the testbed reproducible within a couple of hours/days related to the VMs you spawn.

Ralf

berendt commented 2 years ago

I think we should enable the ES curator in the testbed with a pretty short retention time. Same for Prometheus.

berendt commented 2 years ago

Hope that a retention time of 1 day will work. If not please re-open this issue.

berendt commented 2 years ago

Run osism apply elasticsearch to deploy the elasticsearch-curator service.

osism / testbed

excessive logging breaks testbed after ~2-3 days uptime #1260