Closed wenerme closed 2 years ago
I got around to testing this against other engines:
[sqlite] time="2021-03-30T07:23:41Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
[sqlite] time="2021-03-30T07:23:41Z" level=info msg="Database tables and indexes are up to date"
[sqlite] time="2021-03-30T07:23:41Z" level=error msg="Failed to create health check key: no such column: t.id"
[sqlite] time="2021-03-30T07:23:41Z" level=info msg="Kine listening on tcp://0.0.0.0:2379"
[sqlite] time="2021-03-30T07:23:42Z" level=error msg="error while range on /bootstrap/e3b0c44298fc : no such column: t.id"
[mysql-8.0] time="2021-03-30T07:26:00Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
[mysql-8.0] time="2021-03-30T07:26:00Z" level=info msg="Database tables and indexes are up to date"
[mysql-8.0] time="2021-03-30T07:26:00Z" level=error msg="Failed to create health check key: Error 1054: Unknown column 't.id' in 'order clause'"
[mysql-8.0] time="2021-03-30T07:26:00Z" level=info msg="Kine listening on tcp://0.0.0.0:2379"
[mysql-8.0] time="2021-03-30T07:26:00Z" level=error msg="error while range on /bootstrap/e3b0c44298fc : Error 1054: Unknown column 't.id' in 'order clause'"
@brandond Hi, how can I test this, the ci passed, I don't expected the sqlite will fail.
Our current CI tests are pretty miserable, they don't actually test anything other than basic syntax and compilation. PR #81 actually adds perf tests.
This change also seems to benefit newer versions of postgres more than old ones. Current code:
[postgres-13.2] [PERF] 0.009 average etcd request duration (seconds): create *core.ConfigMap
[postgres-13.2] [PERF] 0.014 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-13.2] [PERF] 0.006 average etcd request duration (seconds): get *core.ConfigMap
[postgres-13.2] [PERF] 0.037 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-13.2] [PERF] 0.017 average etcd request duration (seconds): update *core.ConfigMap
[postgres-12.6] [PERF] 0.007 average etcd request duration (seconds): create *core.ConfigMap
[postgres-12.6] [PERF] 0.011 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-12.6] [PERF] 0.004 average etcd request duration (seconds): get *core.ConfigMap
[postgres-12.6] [PERF] 0.030 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-12.6] [PERF] 0.013 average etcd request duration (seconds): update *core.ConfigMap
[postgres-11.11] [PERF] 0.007 average etcd request duration (seconds): create *core.ConfigMap
[postgres-11.11] [PERF] 0.012 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-11.11] [PERF] 0.004 average etcd request duration (seconds): get *core.ConfigMap
[postgres-11.11] [PERF] 0.032 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-11.11] [PERF] 0.013 average etcd request duration (seconds): update *core.ConfigMap
[postgres-10.16] [PERF] 0.008 average etcd request duration (seconds): create *core.ConfigMap
[postgres-10.16] [PERF] 0.013 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-10.16] [PERF] 0.005 average etcd request duration (seconds): get *core.ConfigMap
[postgres-10.16] [PERF] 0.033 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-10.16] [PERF] 0.016 average etcd request duration (seconds): update *core.ConfigMap
[postgres-9.6] [PERF] 0.006 average etcd request duration (seconds): create *core.ConfigMap
[postgres-9.6] [PERF] 0.010 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-9.6] [PERF] 0.004 average etcd request duration (seconds): get *core.ConfigMap
[postgres-9.6] [PERF] 0.031 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-9.6] [PERF] 0.012 average etcd request duration (seconds): update *core.ConfigMap
[postgres-9.5] [PERF] 0.004 average etcd request duration (seconds): create *core.ConfigMap
[postgres-9.5] [PERF] 0.009 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-9.5] [PERF] 0.002 average etcd request duration (seconds): get *core.ConfigMap
[postgres-9.5] [PERF] 0.027 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-9.5] [PERF] 0.010 average etcd request duration (seconds): update *core.ConfigMap
With this PR:
[postgres-13.2] [PERF] 0.005 average etcd request duration (seconds): create *core.ConfigMap
[postgres-13.2] [PERF] 0.010 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-13.2] [PERF] 0.002 average etcd request duration (seconds): get *core.ConfigMap
[postgres-13.2] [PERF] 0.030 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-13.2] [PERF] 0.011 average etcd request duration (seconds): update *core.ConfigMap
[postgres-12.6] [PERF] 0.005 average etcd request duration (seconds): create *core.ConfigMap
[postgres-12.6] [PERF] 0.010 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-12.6] [PERF] 0.002 average etcd request duration (seconds): get *core.ConfigMap
[postgres-12.6] [PERF] 0.030 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-12.6] [PERF] 0.011 average etcd request duration (seconds): update *core.ConfigMap
[postgres-11.11] [PERF] 0.005 average etcd request duration (seconds): create *core.ConfigMap
[postgres-11.11] [PERF] 0.009 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-11.11] [PERF] 0.002 average etcd request duration (seconds): get *core.ConfigMap
[postgres-11.11] [PERF] 0.029 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-11.11] [PERF] 0.011 average etcd request duration (seconds): update *core.ConfigMap
[postgres-10.16] [PERF] 0.005 average etcd request duration (seconds): create *core.ConfigMap
[postgres-10.16] [PERF] 0.008 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-10.16] [PERF] 0.003 average etcd request duration (seconds): get *core.ConfigMap
[postgres-10.16] [PERF] 0.031 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-10.16] [PERF] 0.011 average etcd request duration (seconds): update *core.ConfigMap
[postgres-9.6] [PERF] 0.004 average etcd request duration (seconds): create *core.ConfigMap
[postgres-9.6] [PERF] 0.008 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-9.6] [PERF] 0.002 average etcd request duration (seconds): get *core.ConfigMap
[postgres-9.6] [PERF] 0.030 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-9.6] [PERF] 0.009 average etcd request duration (seconds): update *core.ConfigMap
[postgres-9.5] [PERF] 0.004 average etcd request duration (seconds): create *core.ConfigMap
[postgres-9.5] [PERF] 0.008 average etcd request duration (seconds): delete *core.ConfigMap
[postgres-9.5] [PERF] 0.002 average etcd request duration (seconds): get *core.ConfigMap
[postgres-9.5] [PERF] 0.028 average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-9.5] [PERF] 0.009 average etcd request duration (seconds): update *core.ConfigMap
@brandond should be fixed now, tested sqlite on my local, should I also add this test ?
func TestSqliteDriver(t *testing.T) {
_ = os.RemoveAll("db")
config := endpoint.Config{}
logrus.SetLevel(logrus.TraceLevel)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
etcdConfig, err := endpoint.Listen(ctx, config)
if err != nil {
t.Fatal(err)
}
cli, err := client.New(etcdConfig)
if err != nil {
t.Fatal(err)
}
err = cli.Create(ctx, "/test/key", []byte("value"))
if err != nil {
t.Fatal(err)
}
val, err := cli.List(ctx, "/test", 0)
if err != nil {
t.Fatal(err)
}
if len(val) != 1 {
t.Fatal("expected len 1 got:", len(val))
}
}
You can leave out the sqlite-specific test; my PR will handle testing the drivers.
@wenerme thanks for working on this! I addressed a minor nit in the syntax and rebased it on to current master to get CI working again. It appears that your commit isn't signed off using an address that matches your registered GitHub email, and this is causing the DCO check to fail. Are you able to edit your commit to use the correct address, or update your address in your GH profile?
Ok, I will update the pr later today
On Thu, Feb 3, 2022 at 9:12 AM Brad Davidson @.***> wrote:
@wenerme https://github.com/wenerme thanks for working on this! I addressed a minor nit in the syntax and rebased it on to current master, but it appears that your commit isn't signed off using an address that matches your registered GitHub email, and this is causing the DCO check to fail. Are you able to edit your commit to use the correct address, or update your address in your GH profile?
— Reply to this email directly, view it on GitHub https://github.com/k3s-io/kine/pull/78#issuecomment-1028510085, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANR4OZDNKQKAB77ILDKRTLUZHJAPANCNFSM4X4FXDIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
--
by wener https://wener.me/
Can you share the scenario in which you saw this offer the 1000x speedup described in #77? I see that in the example SQL query you're listing events - how many events does the cluster in question have? I'm not seeing the same thing in the tests, but they use configmaps, not events.
postgres-13.2[m] [PERF] 0.023 average etcd request duration (seconds): list *[]core.ConfigMap
postgres-13.2[m] [PERF] 0.005 [ 74] ███▋
postgres-13.2[m] [PERF] 0.025 [1004] ██████████████████████████████████████████████████
postgres-13.2[m] [PERF] 0.1 [ 924] ██████████████████████████████████████████████
postgres-13.2[m] [PERF] 0.25 [ 0]
postgres-13.2[m] [PERF] 0.5 [ 0]
postgres-13.2[m] [PERF] 1 [ 0]
postgres-13.2[m] [PERF] 2 [ 0]
postgres-13.2[m] [PERF] 4 [ 0]
postgres-13.2[m] [PERF] 15 [ 0]
postgres-13.2[m] [PERF] 30 [ 0]
postgres-13.2[m] [PERF] 60 [ 0]
postgres-13.2[m] [PERF] +Inf [ 0]
postgres-13.2[m] [PERF] 0.024 average etcd request duration (seconds): list *[]core.ConfigMap
postgres-13.2[m] [PERF] 0.005 [ 55] ██▉
postgres-13.2[m] [PERF] 0.025 [977] ██████████████████████████████████████████████████
postgres-13.2[m] [PERF] 0.1 [921] ███████████████████████████████████████████████▏
postgres-13.2[m] [PERF] 0.25 [ 0]
postgres-13.2[m] [PERF] 0.5 [ 0]
postgres-13.2[m] [PERF] 1 [ 0]
postgres-13.2[m] [PERF] 2 [ 0]
postgres-13.2[m] [PERF] 4 [ 0]
postgres-13.2[m] [PERF] 15 [ 0]
postgres-13.2[m] [PERF] 30 [ 0]
postgres-13.2[m] [PERF] 60 [ 0]
postgres-13.2[m] [PERF] +Inf [ 0]
@brandond the problem and query is found in statement log, verified by manually execute the sql
Original
explain analyse
SELECT (
SELECT MAX(rkv.id) AS id
FROM kine AS rkv),
(
SELECT MAX(crkv.prev_revision) AS prev_revision
FROM kine AS crkv
WHERE crkv.name = 'compact_rev_key'),
kv.id AS theid,
kv.name,
kv.created,
kv.deleted,
kv.create_revision,
kv.prev_revision,
kv.lease,
kv.value,
kv.old_value
FROM kine AS kv
JOIN (
SELECT MAX(mkv.id) AS id
FROM kine AS mkv
WHERE mkv.name LIKE '/registry/events/%'
GROUP BY mkv.name) maxkv
ON maxkv.id = kv.id
WHERE (kv.deleted = 0 OR 'f')
ORDER BY kv.id ASC
LIMIT 2;
on low-end pg server....
Limit (cost=3.19..26410.22 rows=2 width=1055) (actual time=81243.795..81305.631 rows=2 loops=1)
InitPlan 2 (returns $1)
-> Result (cost=0.58..0.59 rows=1 width=4) (actual time=0.058..0.060 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.58 rows=1 width=4) (actual time=0.055..0.056 rows=1 loops=1)
-> Index Only Scan Backward using kine_pkey on kine rkv (cost=0.42..140552.64 rows=879771 width=4) (actual time=0.054..0.054 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Heap Fetches: 1
InitPlan 4 (returns $3)
-> Result (cost=1.61..1.62 rows=1 width=4) (actual time=0.040..0.041 rows=1 loops=1)
InitPlan 3 (returns $2)
-> Limit (cost=0.55..1.61 rows=1 width=4) (actual time=0.038..0.039 rows=1 loops=1)
-> Index Only Scan Backward using kine_name_prev_revision_uindex on kine crkv (cost=0.55..12.26 rows=11 width=4) (actual time=0.038..0.038 rows=1 loops=1)
Index Cond: ((name = 'compact_rev_key'::text) AND (prev_revision IS NOT NULL))
Heap Fetches: 1
-> Nested Loop (cost=0.98..541159143.40 rows=40986 width=1055) (actual time=81243.795..81305.625 rows=2 loops=1)
Join Filter: (kv.id = maxkv.id)
Rows Removed by Join Filter: 355599741
-> Index Scan using kine_pkey on kine kv (cost=0.42..146625.80 rows=680092 width=1047) (actual time=0.031..80.495 rows=1280 loops=1)
Filter: (deleted = 0)
Rows Removed by Filter: 139429
-> Materialize (cost=0.55..135482.55 rows=53020 width=4) (actual time=0.002..34.213 rows=277812 loops=1280)
-> Subquery Scan on maxkv (cost=0.55..135217.45 rows=53020 width=4) (actual time=0.026..2190.921 rows=277860 loops=1)
-> GroupAggregate (cost=0.55..134687.25 rows=53020 width=62) (actual time=0.026..2143.559 rows=277860 loops=1)
Group Key: mkv.name
-> Index Scan using kine_name_index on kine mkv (cost=0.55..132046.90 rows=422029 width=62) (actual time=0.017..1976.507 rows=433226 loops=1)
Index Cond: (((name)::text >= '/registry/events/'::text) AND ((name)::text < '/registry/events0'::text))
Filter: ((name)::text ~~ '/registry/events/%'::text)
Planning Time: 0.466 ms
Execution Time: 81307.604 ms
Modified
Same pg server
explain analyse
select *
from (SELECT (
SELECT MAX(rkv.id) AS id
FROM kine AS rkv),
(
SELECT MAX(crkv.prev_revision) AS prev_revision
FROM kine AS crkv
WHERE crkv.name = 'compact_rev_key'),
kv.id AS theid,
kv.name,
kv.created,
kv.deleted,
kv.create_revision,
kv.prev_revision,
kv.lease,
kv.value,
kv.old_value
FROM kine AS kv
JOIN (
SELECT MAX(mkv.id) AS id
FROM kine AS mkv
WHERE mkv.name LIKE '/registry/events/%'
GROUP BY mkv.name) maxkv
ON maxkv.id = kv.id
WHERE (kv.deleted = 0 OR 'f')) t
ORDER BY id ASC
LIMIT 2;
Limit (cost=3.19..13.49 rows=2 width=1055) (actual time=0.135..0.149 rows=2 loops=1)
InitPlan 2 (returns $1)
-> Result (cost=0.58..0.59 rows=1 width=4) (actual time=0.022..0.023 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.58 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=1)
-> Index Only Scan Backward using kine_pkey on kine rkv (cost=0.42..140552.64 rows=879771 width=4) (actual time=0.020..0.021 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Heap Fetches: 1
InitPlan 4 (returns $3)
-> Result (cost=1.61..1.62 rows=1 width=4) (actual time=0.024..0.025 rows=1 loops=1)
InitPlan 3 (returns $2)
-> Limit (cost=0.55..1.61 rows=1 width=4) (actual time=0.024..0.024 rows=1 loops=1)
-> Index Only Scan Backward using kine_name_prev_revision_uindex on kine crkv (cost=0.55..12.26 rows=11 width=4) (actual time=0.023..0.023 rows=1 loops=1)
Index Cond: ((name = 'compact_rev_key'::text) AND (prev_revision IS NOT NULL))
Heap Fetches: 1
-> Nested Loop (cost=0.98..211065.75 rows=40986 width=1055) (actual time=0.135..0.146 rows=2 loops=1)
-> GroupAggregate (cost=0.55..134687.25 rows=53020 width=62) (actual time=0.066..0.069 rows=2 loops=1)
Group Key: mkv.name
-> Index Scan using kine_name_index on kine mkv (cost=0.55..132046.90 rows=422029 width=62) (actual time=0.053..0.056 rows=3 loops=1)
Index Cond: (((name)::text >= '/registry/events/'::text) AND ((name)::text < '/registry/events0'::text))
Filter: ((name)::text ~~ '/registry/events/%'::text)
-> Index Scan using kine_pkey on kine kv (cost=0.42..1.43 rows=1 width=1047) (actual time=0.013..0.013 rows=1 loops=2)
Index Cond: (id = (max(mkv.id)))
Filter: (deleted = 0)
Planning Time: 0.611 ms
Execution Time: 0.198 ms
due to this problem, on newer k3s cluster I use the embedded etcd.
Hmm, that's interesting. I can't reproduce that locally, but this at least doesn't seem to make things worse on larger servers so lets go for it.
Is there any plan to implement kine 0.9.0 inside the latest version of k3s? Thank you
The 1.22+ branches of K3s have had kine 0.9.1 for about two months. Is there something in particular you're looking for?
Oh I thought they were still on kine v0.8.1 like it's written in this table: https://github.com/k3s-io/k3s/releases/tag/v1.23.6%2Bk3s1
It's been on the branches for that long. The May Kubernetes patch releases came out very late in the month due to Kubecon EU, and the corresponding K3s releases should be out quite soon.
Related to
77