Closed jaycedowell closed 2 months ago
@ctaylor-physics you should probably be away that this can happen.
This seems to have happened again.
I tried this:
import etcd3
c = etcd3.Etcd3Client('127.0.0.1')
r = 1e12
for k in c.get_all():
if k[1].mod_revision < r:
r = k[1].mod_revision
r = (r//100)*100
c.compact(r, physical=True)
The database size, i.e., c.status().db_size
, didn't show a huge change. Adding in a c.defragment()
after the compaction lead to a much smaller database size but INI still doesn't work.
Update: SHT then INI also doesn't work.
Update: Neither does restarting the ASP MCS service.
Update: Neither does restarting the etcd service.
Update: Neither does restarting the machine.
Final Update:
The secret seems to be that after freeing up space you need to clear the "NOSPACE" alarm with ETCDCTL_API=3 etcdctl alarm disarm
. This might could have been done as part of that Python sequence by throwing in a c.disarm_alarm()
after the defragment call.
In any case I think the path forward is to add some kind of daily/weekly maintenance into asp_cmd.py
. Something compacts all but the N (maybe N=1000?, 10000?) most recent revisions, defragments, and then does an alarm clear for good measure.
Added compactEtcd.py
to run every Sunday to root's crontab.
This is still a problem.
Fixed with #13.
From the logs:
Blowing away the current etcd database fixes this but it will happen again.