listSQL do subquery first then order by - fix #77

wenerme commented 3 years ago

Related to

77

brandond commented 3 years ago

I got around to testing this against other engines:

[sqlite] time="2021-03-30T07:23:41Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
[sqlite] time="2021-03-30T07:23:41Z" level=info msg="Database tables and indexes are up to date"
[sqlite] time="2021-03-30T07:23:41Z" level=error msg="Failed to create health check key: no such column: t.id"
[sqlite] time="2021-03-30T07:23:41Z" level=info msg="Kine listening on tcp://0.0.0.0:2379"
[sqlite] time="2021-03-30T07:23:42Z" level=error msg="error while range on /bootstrap/e3b0c44298fc : no such column: t.id"

[mysql-8.0] time="2021-03-30T07:26:00Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
[mysql-8.0] time="2021-03-30T07:26:00Z" level=info msg="Database tables and indexes are up to date"
[mysql-8.0] time="2021-03-30T07:26:00Z" level=error msg="Failed to create health check key: Error 1054: Unknown column 't.id' in 'order clause'"
[mysql-8.0] time="2021-03-30T07:26:00Z" level=info msg="Kine listening on tcp://0.0.0.0:2379"
[mysql-8.0] time="2021-03-30T07:26:00Z" level=error msg="error while range on /bootstrap/e3b0c44298fc : Error 1054: Unknown column 't.id' in 'order clause'"

wenerme commented 3 years ago

@brandond Hi, how can I test this, the ci passed, I don't expected the sqlite will fail.

brandond commented 3 years ago

Our current CI tests are pretty miserable, they don't actually test anything other than basic syntax and compilation. PR #81 actually adds perf tests.

This change also seems to benefit newer versions of postgres more than old ones. Current code:

[postgres-13.2] [PERF]  0.009  average etcd request duration (seconds): create *core.ConfigMap
[postgres-13.2] [PERF]  0.014  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-13.2] [PERF]  0.006  average etcd request duration (seconds): get *core.ConfigMap
[postgres-13.2] [PERF]  0.037  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-13.2] [PERF]  0.017  average etcd request duration (seconds): update *core.ConfigMap
[postgres-12.6] [PERF]  0.007  average etcd request duration (seconds): create *core.ConfigMap
[postgres-12.6] [PERF]  0.011  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-12.6] [PERF]  0.004  average etcd request duration (seconds): get *core.ConfigMap
[postgres-12.6] [PERF]  0.030  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-12.6] [PERF]  0.013  average etcd request duration (seconds): update *core.ConfigMap
[postgres-11.11] [PERF] 0.007  average etcd request duration (seconds): create *core.ConfigMap
[postgres-11.11] [PERF] 0.012  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-11.11] [PERF] 0.004  average etcd request duration (seconds): get *core.ConfigMap
[postgres-11.11] [PERF] 0.032  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-11.11] [PERF] 0.013  average etcd request duration (seconds): update *core.ConfigMap
[postgres-10.16] [PERF] 0.008  average etcd request duration (seconds): create *core.ConfigMap
[postgres-10.16] [PERF] 0.013  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-10.16] [PERF] 0.005  average etcd request duration (seconds): get *core.ConfigMap
[postgres-10.16] [PERF] 0.033  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-10.16] [PERF] 0.016  average etcd request duration (seconds): update *core.ConfigMap
[postgres-9.6] [PERF]   0.006  average etcd request duration (seconds): create *core.ConfigMap
[postgres-9.6] [PERF]   0.010  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-9.6] [PERF]   0.004  average etcd request duration (seconds): get *core.ConfigMap
[postgres-9.6] [PERF]   0.031  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-9.6] [PERF]   0.012  average etcd request duration (seconds): update *core.ConfigMap
[postgres-9.5] [PERF]   0.004  average etcd request duration (seconds): create *core.ConfigMap
[postgres-9.5] [PERF]   0.009  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-9.5] [PERF]   0.002  average etcd request duration (seconds): get *core.ConfigMap
[postgres-9.5] [PERF]   0.027  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-9.5] [PERF]   0.010  average etcd request duration (seconds): update *core.ConfigMap

With this PR:

[postgres-13.2] [PERF]  0.005  average etcd request duration (seconds): create *core.ConfigMap
[postgres-13.2] [PERF]  0.010  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-13.2] [PERF]  0.002  average etcd request duration (seconds): get *core.ConfigMap
[postgres-13.2] [PERF]  0.030  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-13.2] [PERF]  0.011  average etcd request duration (seconds): update *core.ConfigMap
[postgres-12.6] [PERF]  0.005  average etcd request duration (seconds): create *core.ConfigMap
[postgres-12.6] [PERF]  0.010  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-12.6] [PERF]  0.002  average etcd request duration (seconds): get *core.ConfigMap
[postgres-12.6] [PERF]  0.030  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-12.6] [PERF]  0.011  average etcd request duration (seconds): update *core.ConfigMap
[postgres-11.11] [PERF] 0.005  average etcd request duration (seconds): create *core.ConfigMap
[postgres-11.11] [PERF] 0.009  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-11.11] [PERF] 0.002  average etcd request duration (seconds): get *core.ConfigMap
[postgres-11.11] [PERF] 0.029  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-11.11] [PERF] 0.011  average etcd request duration (seconds): update *core.ConfigMap
[postgres-10.16] [PERF] 0.005  average etcd request duration (seconds): create *core.ConfigMap
[postgres-10.16] [PERF] 0.008  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-10.16] [PERF] 0.003  average etcd request duration (seconds): get *core.ConfigMap
[postgres-10.16] [PERF] 0.031  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-10.16] [PERF] 0.011  average etcd request duration (seconds): update *core.ConfigMap
[postgres-9.6] [PERF]   0.004  average etcd request duration (seconds): create *core.ConfigMap
[postgres-9.6] [PERF]   0.008  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-9.6] [PERF]   0.002  average etcd request duration (seconds): get *core.ConfigMap
[postgres-9.6] [PERF]   0.030  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-9.6] [PERF]   0.009  average etcd request duration (seconds): update *core.ConfigMap
[postgres-9.5] [PERF]   0.004  average etcd request duration (seconds): create *core.ConfigMap
[postgres-9.5] [PERF]   0.008  average etcd request duration (seconds): delete *core.ConfigMap
[postgres-9.5] [PERF]   0.002  average etcd request duration (seconds): get *core.ConfigMap
[postgres-9.5] [PERF]   0.028  average etcd request duration (seconds): list *[]core.ConfigMap
[postgres-9.5] [PERF]   0.009  average etcd request duration (seconds): update *core.ConfigMap

wenerme commented 3 years ago

@brandond should be fixed now, tested sqlite on my local, should I also add this test ?

func TestSqliteDriver(t *testing.T) {
    _ = os.RemoveAll("db")
    config := endpoint.Config{}
    logrus.SetLevel(logrus.TraceLevel)
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    etcdConfig, err := endpoint.Listen(ctx, config)
    if err != nil {
        t.Fatal(err)
    }
    cli, err := client.New(etcdConfig)
    if err != nil {
        t.Fatal(err)
    }
    err = cli.Create(ctx, "/test/key", []byte("value"))
    if err != nil {
        t.Fatal(err)
    }
    val, err := cli.List(ctx, "/test", 0)
    if err != nil {
        t.Fatal(err)
    }
    if len(val) != 1 {
        t.Fatal("expected len 1 got:", len(val))
    }
}

brandond commented 3 years ago

You can leave out the sqlite-specific test; my PR will handle testing the drivers.

brandond commented 2 years ago

@wenerme thanks for working on this! I addressed a minor nit in the syntax and rebased it on to current master to get CI working again. It appears that your commit isn't signed off using an address that matches your registered GitHub email, and this is causing the DCO check to fail. Are you able to edit your commit to use the correct address, or update your address in your GH profile?

wenerme commented 2 years ago

Ok, I will update the pr later today

On Thu, Feb 3, 2022 at 9:12 AM Brad Davidson @.***> wrote:

@wenerme https://github.com/wenerme thanks for working on this! I addressed a minor nit in the syntax and rebased it on to current master, but it appears that your commit isn't signed off using an address that matches your registered GitHub email, and this is causing the DCO check to fail. Are you able to edit your commit to use the correct address, or update your address in your GH profile?

— Reply to this email directly, view it on GitHub https://github.com/k3s-io/kine/pull/78#issuecomment-1028510085, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANR4OZDNKQKAB77ILDKRTLUZHJAPANCNFSM4X4FXDIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

--

by wener https://wener.me/

brandond commented 2 years ago

Can you share the scenario in which you saw this offer the 1000x speedup described in #77? I see that in the example SQL query you're listing events - how many events does the cluster in question have? I'm not seeing the same thing in the tests, but they use configmaps, not events.

Before:

postgres-13.2[m] [PERF]    0.023  average etcd request duration (seconds): list *[]core.ConfigMap
postgres-13.2[m] [PERF]    0.005  [  74]  ███▋
postgres-13.2[m] [PERF]    0.025  [1004]  ██████████████████████████████████████████████████
postgres-13.2[m] [PERF]    0.1    [ 924]  ██████████████████████████████████████████████
postgres-13.2[m] [PERF]    0.25   [   0]
postgres-13.2[m] [PERF]    0.5    [   0]
postgres-13.2[m] [PERF]    1      [   0]
postgres-13.2[m] [PERF]    2      [   0]
postgres-13.2[m] [PERF]    4      [   0]
postgres-13.2[m] [PERF]    15     [   0]
postgres-13.2[m] [PERF]    30     [   0]
postgres-13.2[m] [PERF]    60     [   0]
postgres-13.2[m] [PERF]    +Inf   [   0]

After:

postgres-13.2[m] [PERF]    0.024  average etcd request duration (seconds): list *[]core.ConfigMap
postgres-13.2[m] [PERF]    0.005  [ 55]  ██▉
postgres-13.2[m] [PERF]    0.025  [977]  ██████████████████████████████████████████████████
postgres-13.2[m] [PERF]    0.1    [921]  ███████████████████████████████████████████████▏
postgres-13.2[m] [PERF]    0.25   [  0]
postgres-13.2[m] [PERF]    0.5    [  0]
postgres-13.2[m] [PERF]    1      [  0]
postgres-13.2[m] [PERF]    2      [  0]
postgres-13.2[m] [PERF]    4      [  0]
postgres-13.2[m] [PERF]    15     [  0]
postgres-13.2[m] [PERF]    30     [  0]
postgres-13.2[m] [PERF]    60     [  0]
postgres-13.2[m] [PERF]    +Inf   [  0]

wenerme commented 2 years ago

@brandond the problem and query is found in statement log, verified by manually execute the sql

Original

PostgreSQL 12.4

explain analyse
SELECT (
           SELECT MAX(rkv.id) AS id
           FROM kine AS rkv),
       (
           SELECT MAX(crkv.prev_revision) AS prev_revision
           FROM kine AS crkv
           WHERE crkv.name = 'compact_rev_key'),
       kv.id AS theid,
       kv.name,
       kv.created,
       kv.deleted,
       kv.create_revision,
       kv.prev_revision,
       kv.lease,
       kv.value,
       kv.old_value
FROM kine AS kv
         JOIN (
    SELECT MAX(mkv.id) AS id
    FROM kine AS mkv
    WHERE mkv.name LIKE '/registry/events/%'
    GROUP BY mkv.name) maxkv
              ON maxkv.id = kv.id
WHERE (kv.deleted = 0 OR 'f')
ORDER BY kv.id ASC
LIMIT 2;

on low-end pg server....

Limit  (cost=3.19..26410.22 rows=2 width=1055) (actual time=81243.795..81305.631 rows=2 loops=1)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.58..0.59 rows=1 width=4) (actual time=0.058..0.060 rows=1 loops=1)
          InitPlan 1 (returns $0)
            ->  Limit  (cost=0.42..0.58 rows=1 width=4) (actual time=0.055..0.056 rows=1 loops=1)
                  ->  Index Only Scan Backward using kine_pkey on kine rkv  (cost=0.42..140552.64 rows=879771 width=4) (actual time=0.054..0.054 rows=1 loops=1)
                        Index Cond: (id IS NOT NULL)
                        Heap Fetches: 1
  InitPlan 4 (returns $3)
    ->  Result  (cost=1.61..1.62 rows=1 width=4) (actual time=0.040..0.041 rows=1 loops=1)
          InitPlan 3 (returns $2)
            ->  Limit  (cost=0.55..1.61 rows=1 width=4) (actual time=0.038..0.039 rows=1 loops=1)
                  ->  Index Only Scan Backward using kine_name_prev_revision_uindex on kine crkv  (cost=0.55..12.26 rows=11 width=4) (actual time=0.038..0.038 rows=1 loops=1)
                        Index Cond: ((name = 'compact_rev_key'::text) AND (prev_revision IS NOT NULL))
                        Heap Fetches: 1
  ->  Nested Loop  (cost=0.98..541159143.40 rows=40986 width=1055) (actual time=81243.795..81305.625 rows=2 loops=1)
        Join Filter: (kv.id = maxkv.id)
        Rows Removed by Join Filter: 355599741
        ->  Index Scan using kine_pkey on kine kv  (cost=0.42..146625.80 rows=680092 width=1047) (actual time=0.031..80.495 rows=1280 loops=1)
              Filter: (deleted = 0)
              Rows Removed by Filter: 139429
        ->  Materialize  (cost=0.55..135482.55 rows=53020 width=4) (actual time=0.002..34.213 rows=277812 loops=1280)
              ->  Subquery Scan on maxkv  (cost=0.55..135217.45 rows=53020 width=4) (actual time=0.026..2190.921 rows=277860 loops=1)
                    ->  GroupAggregate  (cost=0.55..134687.25 rows=53020 width=62) (actual time=0.026..2143.559 rows=277860 loops=1)
                          Group Key: mkv.name
                          ->  Index Scan using kine_name_index on kine mkv  (cost=0.55..132046.90 rows=422029 width=62) (actual time=0.017..1976.507 rows=433226 loops=1)
                                Index Cond: (((name)::text >= '/registry/events/'::text) AND ((name)::text < '/registry/events0'::text))
                                Filter: ((name)::text ~~ '/registry/events/%'::text)
Planning Time: 0.466 ms
Execution Time: 81307.604 ms

Modified

Same pg server

explain analyse
select *
from (SELECT (
                 SELECT MAX(rkv.id) AS id
                 FROM kine AS rkv),
             (
                 SELECT MAX(crkv.prev_revision) AS prev_revision
                 FROM kine AS crkv
                 WHERE crkv.name = 'compact_rev_key'),
             kv.id AS theid,
             kv.name,
             kv.created,
             kv.deleted,
             kv.create_revision,
             kv.prev_revision,
             kv.lease,
             kv.value,
             kv.old_value
      FROM kine AS kv
               JOIN (
          SELECT MAX(mkv.id) AS id
          FROM kine AS mkv
          WHERE mkv.name LIKE '/registry/events/%'
          GROUP BY mkv.name) maxkv
                    ON maxkv.id = kv.id
      WHERE (kv.deleted = 0 OR 'f')) t
ORDER BY id ASC
LIMIT 2;

Limit  (cost=3.19..13.49 rows=2 width=1055) (actual time=0.135..0.149 rows=2 loops=1)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.58..0.59 rows=1 width=4) (actual time=0.022..0.023 rows=1 loops=1)
          InitPlan 1 (returns $0)
            ->  Limit  (cost=0.42..0.58 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=1)
                  ->  Index Only Scan Backward using kine_pkey on kine rkv  (cost=0.42..140552.64 rows=879771 width=4) (actual time=0.020..0.021 rows=1 loops=1)
                        Index Cond: (id IS NOT NULL)
                        Heap Fetches: 1
  InitPlan 4 (returns $3)
    ->  Result  (cost=1.61..1.62 rows=1 width=4) (actual time=0.024..0.025 rows=1 loops=1)
          InitPlan 3 (returns $2)
            ->  Limit  (cost=0.55..1.61 rows=1 width=4) (actual time=0.024..0.024 rows=1 loops=1)
                  ->  Index Only Scan Backward using kine_name_prev_revision_uindex on kine crkv  (cost=0.55..12.26 rows=11 width=4) (actual time=0.023..0.023 rows=1 loops=1)
                        Index Cond: ((name = 'compact_rev_key'::text) AND (prev_revision IS NOT NULL))
                        Heap Fetches: 1
  ->  Nested Loop  (cost=0.98..211065.75 rows=40986 width=1055) (actual time=0.135..0.146 rows=2 loops=1)
        ->  GroupAggregate  (cost=0.55..134687.25 rows=53020 width=62) (actual time=0.066..0.069 rows=2 loops=1)
              Group Key: mkv.name
              ->  Index Scan using kine_name_index on kine mkv  (cost=0.55..132046.90 rows=422029 width=62) (actual time=0.053..0.056 rows=3 loops=1)
                    Index Cond: (((name)::text >= '/registry/events/'::text) AND ((name)::text < '/registry/events0'::text))
                    Filter: ((name)::text ~~ '/registry/events/%'::text)
        ->  Index Scan using kine_pkey on kine kv  (cost=0.42..1.43 rows=1 width=1047) (actual time=0.013..0.013 rows=1 loops=2)
              Index Cond: (id = (max(mkv.id)))
              Filter: (deleted = 0)
Planning Time: 0.611 ms
Execution Time: 0.198 ms

Before Execution Time: 81307.604 ms
After Execution Time: 0.198 ms

due to this problem, on newer k3s cluster I use the embedded etcd.

brandond commented 2 years ago

Hmm, that's interesting. I can't reproduce that locally, but this at least doesn't seem to make things worse on larger servers so lets go for it.

unixfox commented 2 years ago

Is there any plan to implement kine 0.9.0 inside the latest version of k3s? Thank you

brandond commented 2 years ago

The 1.22+ branches of K3s have had kine 0.9.1 for about two months. Is there something in particular you're looking for?

unixfox commented 2 years ago

Oh I thought they were still on kine v0.8.1 like it's written in this table: https://github.com/k3s-io/k3s/releases/tag/v1.23.6%2Bk3s1

brandond commented 2 years ago

It's been on the branches for that long. The May Kubernetes patch releases came out very late in the month due to Kubecon EU, and the corresponding K3s releases should be out quite soon.

k3s-io / kine

listSQL do subquery first then order by - fix #77 #78

77