k3s-io / kine

Run Kubernetes on MySQL, Postgres, sqlite, dqlite, not etcd.
Apache License 2.0
1.63k stars 235 forks source link

How to cleanup a Kine sqlite DB ? #213

Closed rhuss closed 1 year ago

rhuss commented 1 year ago

I have successfully used K3s with a sqlite backend for now 3.5 years constantly on a Raspberry Pi 4 cluster.

Just recently I have sever performance issues with "Slow SQL" queries. Looking into the database I see

SELECT name, COUNT(*) FROM kine GROUP BY name HAVING COUNT(*) > 5000;
/registry/daemonsets/metallb-system/speaker|9757
/registry/daemonsets/prometheus/prometheus-node-exporter|9961
/registry/endpointslices/prometheus/prometheus-node-exporter-nhg47|7515
/registry/leases/kube-node-lease/atlantis|32920
/registry/leases/kube-node-lease/ningaloo|33508
/registry/leases/kube-node-lease/wakatobi|33689
/registry/leases/kube-node-lease/yonaguni|33694
/registry/leases/kube-system/apiserver-u5uxy3pu56cewrfobbktu77aje|34072
/registry/masterleases/10.9.11.7|34594
/registry/minions/yonaguni|5082
/registry/services/endpoints/prometheus/prometheus-node-exporter|7458

I don't think that that many revision of one objects are expected to be here (the leases have all the same 'create_revision' value, but different 'prev_revision' values).

My question: Can I remove all except one at least from the leases ? And why are so many revisions kept like e.g. for the DaemonSets ?

The overall sqlite DB is 4.9 GB in size now, with 640k rows in table 'kine'

brandond commented 1 year ago

You didn't fill out the issue template, so I have to ask - what version of k3s are you on?

rhuss commented 1 year ago

sorry (but there was no issue template to select when creating a "New issue")

I'm running the latest stable version of k3s:

k3s --version
k3s version v1.27.5+k3s1 (8d074ecb)
go version go1.20.7

Running on RaspberryPi OS, 64 Bit.

rhuss commented 1 year ago

Naively deleting leases with the same revision number except for the latest didn't help, I now get tons of other errors (playing back my backup now).

brandond commented 1 year ago

sorry (but there was no issue template to select when creating a "New issue"

Ah, because this issue was probably best asked in the k3s repo, since you're using kine embedded in k3s. We have several issue templates there that ask for relevant information.

Check your k3s journald logs for message that look like these, once every 5 minutes:

INFO[51600] COMPACT compactRev=15073 targetCompactRev=15162 currentRev=16162
INFO[51600] COMPACT deleted 89 rows from 89 revisions in 1.954868ms - compacted to 15162/16162

If you don't see those, and see an error instead, then for some reason your compaction is failing.

Naively deleting leases with the same revision number except for the latest didn't help

Yeah, don't do that. You'd want to shut down k3s and use the sqlite standalone binary to run a manual compact query:

delete from kine where id in (select id from (select id, name from kine where id not in (select max(id) as id from kine group by name)));
rhuss commented 1 year ago

Thanks for the quick feedback!

Indeed, I see something like:

Sep 08 01:37:57 ningaloo k3s[10991]: time="2023-09-08T01:37:57+02:00" level=info msg="Slow SQL (started: 2023-09-08 01:36:56.916335528 +0200 CEST m=+176109.661150280) (total time: 1m0.870460472s):  DELETE FROM kine AS kv WHERE kv.id IN ( SELECT kp.prev_revision AS id FROM kine AS kp WHERE kp.name != 'compact_rev_key' AND kp.prev_revision != 0 AND kp.id <= ? UNION SELECT kd.id AS id FROM kine AS kd WHERE kd.deleted != 0 AND kd.id <= ? ) : [[96481855 96481855]]"
Sep 08 01:37:57 ningaloo k3s[10991]: time="2023-09-08T01:37:57+02:00" level=error msg="Compact failed: failed to record compact revision: sql: transaction has already been committed or rolled back"
rhuss commented 1 year ago

Running now your proposed cleanup statement, vacuum and then will try again. 'will report back, but might take a bit :-)

rhuss commented 1 year ago

@brandond, that helped! I have now back from 4.9 GB to 62 MB and to 20k entries only.

My takeaways for this are:

rhuss commented 1 year ago

Unfortunately I'm not completely done, because my original issue was that an invalid CR broke the K8s garbage collection loop, so that a tons of completed job and pod objects piled up (from a 5min cron-job) that never get garbage collected. I just noticed it when there were over 2k of those jobs.

I've delete them, but they are still contained in the sqlitedb (3.5k rows for jobs & pods) and now they re-appear as zombies in the journal log with:

9104:Sep  6 00:40:00 ningaloo k3s[4553]: I0906 00:40:00.343824    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189105:Sep  6 00:40:00 ningaloo k3s[4553]: I0906 00:40:00.371122    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189106:Sep  6 00:40:00 ningaloo k3s[4553]: I0906 00:40:00.388504    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189107:Sep  6 00:40:00 ningaloo k3s[4553]: I0906 00:40:00.471934    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189138:Sep  6 00:40:01 ningaloo k3s[4553]: I0906 00:40:01.824805    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189147:Sep  6 00:40:02 ningaloo k3s[4553]: I0906 00:40:02.860192    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189174:Sep  6 00:40:45 ningaloo k3s[4553]: I0906 00:40:45.174885    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189175:Sep  6 00:40:46 ningaloo k3s[4553]: I0906 00:40:46.219768    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189176:Sep  6 00:40:47 ningaloo k3s[4553]: I0906 00:40:47.081555    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189177:Sep  6 00:40:48 ningaloo k3s[4553]: I0906 00:40:48.110016    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
189178:Sep  6 00:40:48 ningaloo k3s[4553]: I0906 00:40:48.139088    4553 job_controller.go:523] enqueueing job mqtt/collect-home-metrics-28232560
....

There are no corresponding Jobs resources in kubectl get jobs -A. Is it save to delete rows like these in kine ?

SELECT name, COUNT(*) FROM kine GROUP BY name
....
/registry/jobs/mqtt/collect-home-metrics-28233580|1
/registry/jobs/mqtt/collect-home-metrics-28233585|1
/registry/jobs/mqtt/collect-home-metrics-28233590|1
/registry/jobs/mqtt/collect-home-metrics-28233595|1
/registry/jobs/mqtt/collect-home-metrics-28233600|1
/registry/jobs/mqtt/collect-home-metrics-28233605|1
/registry/jobs/mqtt/collect-home-metrics-28233610|1
/registry/jobs/mqtt/collect-home-metrics-28233615|1
/registry/jobs/mqtt/collect-home-metrics-28233620|1
/registry/jobs/mqtt/collect-home-metrics-28233625|1
/registry/jobs/mqtt/collect-home-metrics-28233630|1
/registry/jobs/mqtt/collect-home-metrics-28233635|1
/registry/jobs/mqtt/collect-home-metrics-28233640|1
/registry/jobs/mqtt/collect-home-metrics-28233645|1
/registry/jobs/mqtt/collect-home-metrics-28233650|1
/registry/jobs/mqtt/collect-home-metrics-28233655|1
/registry/jobs/mqtt/collect-home-metrics-28233660|1
....

Sorry to bother you with this, I know its not originally a kine issue. Feel free to close the issue if this completely off.

rhuss commented 1 year ago

But maybe you can help me with the question of what "k8s" as value mean ? All those zombie rows have this as values ...

brandond commented 1 year ago

Deleted resources will get cleaned up by compaction eventually. The actual compact query run internally is slightly more complex and includes deleted objects.

We're you able to find the log messages regarding compaction? How long is it currently taking?

rhuss commented 1 year ago

@brandond that is all I've found wrt/ compaction: https://github.com/k3s-io/kine/issues/213#issuecomment-1717161761 and as you can see, it ran into a timeout.

(that is what I meant, that the compaction itself ran into a transaction timeout during the delete, which means that from that point on no compaction works anymore because only compaction reduces the data size leading potentially in shorter query times that would fit again into the allowed transaction time)

I think I'm finally good after a lot of trial and error (also getting rid of some container zombies as described in https://github.com/k3s-io/k3s/issues/6185#issuecomment-1399502450) and the cronjob and its enqueue pods).

Thanks again for your support, the manual compaction statement from above to get from 600k to 20k entries was absolute key to getting back into a usable state.

brandond commented 1 year ago

Right, I was hoping to hear whether or not you're getting the same errors after running manual compaction, or what the new elapsed time is if compaction is succeeding. If you are still seeing the automatic compact fail, you'll just run into problems again further down the road - if compaction fails regularly, the database will continue to grow until it is multiple GB again.

rhuss commented 1 year ago

Ah, ok. No, after the manual compaction, the regular compaction works as expected again:

Sep 13 19:47:17 ningaloo k3s[9992]: time="2023-09-13T19:47:17+02:00" level=info msg="COMPACT compactRev=97137012 targetCompactRev=97137238 currentRev=9713
8238"
Sep 13 19:47:17 ningaloo k3s[9992]: time="2023-09-13T19:47:17+02:00" level=info msg="COMPACT deleted 226 rows from 226 revisions in 75.583591ms - compacte
d to 97137238/97138238"

It was only the transaction timeout (because of a too-large DB, manual compaction took also around 15 minutes) which broke the compaction.

brandond commented 1 year ago

OK, great. That's what I was hoping to see.

FWIW, kine will also run a vacuum at startup, so as long as compaction is working and you restart k3s periodically, the database should stay pretty small.

1k-off commented 2 weeks ago

Hi!

Current version: k3s version v1.30.3+k3s1 (f6466040) go version go1.22.5

Started probably from 1.29.8 then upgraded.

OS: ubuntu 22.04

Cluster: CTL: 2 vCPU, 2Gb RAM Worker 1: 8 vCPU 16Gb RAM Worker 2: 8 vCPU, 16 Gb RAM Datastore: 4 vCPU, 8 Gb RAM Datastore type: MySQL 8

After ~1 month of usage datastore size is 1.8 Gb. Performance is as bad as it can be and even worse.

Endless client disconnects during kubectl runs and the inability to deploy anything into the cluster using CI/CD are present.

Actions done:

SELECT * FROM ( SELECT count(1) num, NAME FROM kine GROUP BY `name` ) ab ORDER BY num DESC;

Screenshot 2024-09-16 151538

SELECT name, COUNT(*) FROM kine GROUP BY name HAVING COUNT(*) > 5000;

Screenshot 2024-09-16 151736

SELECT COUNT(*) AS total_rows
FROM kine 
WHERE id IN (
    SELECT id FROM (
        SELECT id, name 
        FROM kine 
        WHERE id NOT IN (
            SELECT MAX(id) 
            FROM kine 
            GROUP BY name
        )
    ) AS temp
);

gives 747579

After running

DELETE FROM kine 
WHERE id IN (
    SELECT id FROM (
        SELECT id, name 
        FROM kine 
        WHERE id NOT IN (
            SELECT MAX(id) 
            FROM kine 
            GROUP BY name
        )
    ) AS temp
);

It feels much better now, but I don't know how long it will last.

@brandond FYI

P.S. I'm not sure, but I didn't notice vacuum running during startup :)

brandond commented 2 weeks ago

@1k-off Datastore type: MySQL 8

You're commenting on an issue about sqlite, but you're not running sqlite. If you're using an external DB, check the K3s log messages about compaction, and make sure that your database is able to keep up with compact requests, otherwise it'll fill as you are seeing. Please take any further conversation about this to a new issue.