kubernetes-sigs / kube-scheduler-wasm-extension

All the things to make the scheduler extendable with wasm.
Apache License 2.0
110 stars 22 forks source link

rebuild wasm files to fix issue with scheduler_perf test #122

Closed dejanzele closed 2 weeks ago

dejanzele commented 3 weeks ago

What type of PR is this?

/kind bug

What this PR does / why we need it:

Rebuilds certain wasm files to fix failing test BenchmarkPerfScheduling in scheduler_perf_test.go.

How to replicate? Run command go test -run=^$ -benchtime=1ns -bench=BenchmarkPerfScheduling in internal/e2e/scheduler_perf folder.

I have replicated this issue on darwin and linux as devcontainer.

What was changed? Deleted all *.wasm files using find . -type f -name "*.wasm" -exec rm -f {} + and run the following commands:

Revision main@a5b575b914054606ba29ea4c856923cdc06d8761

$ go test -run=^$ -benchtime=1ns -bench=BenchmarkPerfScheduling

I1105 21:18:13.508766   75248 plugins.go:59] Registered cloud provider "azure"
I1105 21:18:13.510123   75248 plugins.go:59] Registered cloud provider "gce"
I1105 21:18:13.510238   75248 gce_loadbalancer_metrics.go:50] Registering Service Controller loadbalancer usage metrics &{0x109075960 0x14000a2b8c0 {number_of_l4_ilbs false false false {{0 0} 0 0 {{} 0} {{} 0}} {0 {0 0}} {0 {0 0}} 0x14000a2b950 ALPHA} [feature]}
I1105 21:18:13.510418   75248 plugins.go:59] Registered cloud provider "vsphere"

   ____    __
  / __/___/ /  ___
 / _// __/ _ \/ _ \
/___/\__/_//_/\___/ v3.3.10-dev
High performance, minimalist Go web framework
https://echo.labstack.com
____________________________________O/_______
                                    O\
⇨ http server started on [::]:8080
Shutting down the server...
http: Server closed
Shutted down the server...
goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf
BenchmarkPerfScheduling/Default/500Nodes-12                    1               200.0 SchedulingThroughput/Average              209.0 SchedulingThroughput/Perc50        246.0 SchedulingThroughput/Perc90               246.0 SchedulingThroughput/Perc95               246.0 SchedulingThroughput/Perc99          0.0008680 scheduler_framework_extension_point_duration_ms/Average               0.05000 scheduler_framework_extension_point_duration_ms/Perc50           0.09000 scheduler_framework_extension_point_duration_ms/Perc90          0.09500 scheduler_framework_extension_point_duration_ms/Perc95           0.09900 scheduler_framework_extension_point_duration_ms/Perc99       2161 scheduler_pod_scheduling_duration_ms/Average       2224 scheduler_pod_scheduling_duration_ms/Perc50         4496 scheduler_pod_scheduling_duration_ms/Perc90        4808 scheduler_pod_scheduling_duration_ms/Perc95         5058 scheduler_pod_scheduling_duration_ms/Perc99          18.16 scheduler_scheduling_attempt_duration_ms/Average          15.89 scheduler_scheduling_attempt_duration_ms/Perc50            30.59 scheduler_scheduling_attempt_duration_ms/Perc90           38.19 scheduler_scheduling_attempt_duration_ms/Perc95            58.84 scheduler_scheduling_attempt_duration_ms/Perc99
--- BENCH: BenchmarkPerfScheduling/Default/500Nodes-12
    scheduler_perf_test.go:1073: creating 500 pods in namespace "namespace-1"
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 24
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 184
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 363
    scheduler_perf_test.go:1095: scheduling succeed
    scheduler_perf_test.go:1073: creating 1000 pods in namespace "namespace-2"
    scheduler_perf_test.go:1098: namespace: namespace-2, pods: want 1000, got 52
    scheduler_perf_test.go:1098: namespace: namespace-2, pods: want 1000, got 224
    scheduler_perf_test.go:1098: namespace: namespace-2, pods: want 1000, got 448
    scheduler_perf_test.go:1098: namespace: namespace-2, pods: want 1000, got 676
        ... [output truncated]
goroutine 134779 [running]:
k8s.io/klog/v2/internal/dbg.Stacks(0x0)
        /Users/zele/.gvm/pkgsets/go1.21.13/global/pkg/mod/k8s.io/klog/v2@v2.90.1/internal/dbg/dbg.go:35 +0x8c
k8s.io/klog/v2.(*loggingT).output(0x10908a500, 0x3, 0x0, 0x1400e197500, 0x1, {0x107a0134c?, 0x1?}, 0x14002ad2900?, 0x0)
        /Users/zele/.gvm/pkgsets/go1.21.13/global/pkg/mod/k8s.io/klog/v2@v2.90.1/klog.go:941 +0x5e0
k8s.io/klog/v2.(*loggingT).printfDepth(0x106c7ede8?, 0xbdfe4e0?, 0x0, {0x0, 0x0}, 0x0?, {0x1055139e3, 0x1c}, {0x140043b46f0, 0x1, ...})
        /Users/zele/.gvm/pkgsets/go1.21.13/global/pkg/mod/k8s.io/klog/v2@v2.90.1/klog.go:737 +0x1ac
k8s.io/klog/v2.(*loggingT).printf(...)
        /Users/zele/.gvm/pkgsets/go1.21.13/global/pkg/mod/k8s.io/klog/v2@v2.90.1/klog.go:718
k8s.io/klog/v2.Fatalf(...)
        /Users/zele/.gvm/pkgsets/go1.21.13/global/pkg/mod/k8s.io/klog/v2@v2.90.1/klog.go:1634
sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf.startCustomScheduler({0x106c47020?, 0x14000663d10}, {0x106c7ede8, 0x1400bdfe4e0}, 0x14000ef2b40, 0x14005b08ea0, 0x14000a530e0)
        /Users/zele/Projects/go-k8s-wasm/dejanzele/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf/util.go:154 +0x428
sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf.mustSetupScheduler({0x106c47090?, 0x14010c02070?}, 0x1400ba83180, 0x14005b08ea0, 0x106c22778?)
        /Users/zele/Projects/go-k8s-wasm/dejanzele/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf/util.go:115 +0x134
sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf.runWorkload({0x106c47090, 0x14010c02070}, 0x1400ba83180, 0x14000a225a0, 0x14000612720)
        /Users/zele/Projects/go-k8s-wasm/dejanzele/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf/scheduler_perf_test.go:762 +0x124
sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf.BenchmarkPerfScheduling.func1.1(0x1400ba83180)
        /Users/zele/Projects/go-k8s-wasm/dejanzele/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf/scheduler_perf_test.go:673 +0x1a8
testing.(*B).runN(0x1400ba83180, 0x1)
        /Users/zele/.gvm/gos/go1.21.13/src/testing/benchmark.go:193 +0x128
testing.(*B).run1.func1()
        /Users/zele/.gvm/gos/go1.21.13/src/testing/benchmark.go:233 +0x50
created by testing.(*B).run1 in goroutine 130639
        /Users/zele/.gvm/gos/go1.21.13/src/testing/benchmark.go:226 +0x90
exit status 255
FAIL    sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf   22.961s

This PR:

$ go test -run=^$ -benchtime=1ns -bench=BenchmarkPerfScheduling

I1105 21:19:44.586114   75339 plugins.go:59] Registered cloud provider "azure"
I1105 21:19:44.587482   75339 plugins.go:59] Registered cloud provider "gce"
I1105 21:19:44.587595   75339 gce_loadbalancer_metrics.go:50] Registering Service Controller loadbalancer usage metrics &{0x107181960 0x140007c3950 {number_of_l4_ilbs false false false {{0 0} 0 0 {{} 0} {{} 0}} {0 {0 0}} {0 {0 0}} 0x140007c39e0 ALPHA} [feature]}
I1105 21:19:44.587777   75339 plugins.go:59] Registered cloud provider "vsphere"

...
TRUNCATED
...

____________________________________O/_______
                                    O\
⇨ http server started on [::]:8080
Shutting down the server...
http: Server closed
Shutted down the server...
BenchmarkPerfScheduling/AllEnabled/500Nodes-12                 1                66.67 SchedulingThroughput/Average              70.00 SchedulingThroughput/Perc50                74.00 SchedulingThroughput/Perc90               74.00 SchedulingThroughput/Perc95               74.00 SchedulingThroughput/Perc99                 6.114 scheduler_framework_extension_point_duration_ms/Average           7.492 scheduler_framework_extension_point_duration_ms/Perc50            11.91 scheduler_framework_extension_point_duration_ms/Perc90            12.46 scheduler_framework_extension_point_duration_ms/Perc95             19.58 scheduler_framework_extension_point_duration_ms/Perc99          7303 scheduler_pod_scheduling_duration_ms/Average        7280 scheduler_pod_scheduling_duration_ms/Perc50       16985 scheduler_pod_scheduling_duration_ms/Perc90       18733 scheduler_pod_scheduling_duration_ms/Perc95        20131 scheduler_pod_scheduling_duration_ms/Perc99          15.98 scheduler_scheduling_attempt_duration_ms/Average           13.32 scheduler_scheduling_attempt_duration_ms/Perc50           25.83 scheduler_scheduling_attempt_duration_ms/Perc90           29.15 scheduler_scheduling_attempt_duration_ms/Perc95            31.80 scheduler_scheduling_attempt_duration_ms/Perc99
--- BENCH: BenchmarkPerfScheduling/AllEnabled/500Nodes-12
    scheduler_perf_test.go:1073: creating 500 pods in namespace "namespace-1"
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 0
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 69
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 141
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 211
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 280
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 354
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 427
    scheduler_perf_test.go:1098: namespace: namespace-1, pods: want 500, got 495
    scheduler_perf_test.go:1095: scheduling succeed
        ... [output truncated]
PASS
ok      sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e/scheduler_perf   109.723s

TinyGo version:

$ tinygo version
tinygo version 0.33.0 darwin/arm64 (using go version go1.23.2 and LLVM version 18.1.2)

Go version:

$ go version                                                   
go version go1.21.13 darwin/arm64

System info:

Software:

    System Software Overview:

      System Version: macOS 14.6.1 (23G93)
      Kernel Version: Darwin 23.6.0

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Chip: Apple M3 Pro
      Total Number of Cores: 12 (6 performance and 6 efficiency)
      Memory: 36 GB

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

What are the benchmark results of this change?

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e/scheduler
                                       │  before.txt  │             after.txt              │
                                       │    sec/op    │    sec/op     vs base              │
Example_NodeNumber/Simple/New-12         258.1m ±  3%   272.1m ±  7%  +5.39% (p=0.002 n=6)
Example_NodeNumber/Simple/Run-12         85.03µ ± 11%   90.73µ ± 16%       ~ (p=0.240 n=6)
Example_NodeNumber/Simple_Log/New-12     260.3m ±  2%   283.7m ±  8%  +9.02% (p=0.002 n=6)
Example_NodeNumber/Simple_Log/Run-12     93.10µ ± 14%   90.66µ ± 11%       ~ (p=0.589 n=6)
Example_NodeNumber/Advanced/New-12       557.2m ±  3%   580.5m ±  6%  +4.19% (p=0.002 n=6)
Example_NodeNumber/Advanced/Run-12       32.86µ ±  4%   34.04µ ±  2%  +3.58% (p=0.015 n=6)
Example_NodeNumber/Advanced_Log/New-12   556.7m ±  6%   573.7m ±  0%       ~ (p=0.394 n=6)
Example_NodeNumber/Advanced_Log/Run-12   38.04µ ±  1%   38.72µ ±  2%  +1.78% (p=0.009 n=6)
geomean                                  4.616m         4.793m        +3.83%
k8s-ci-robot commented 3 weeks ago

Welcome @dejanzele!

It looks like this is your first PR to kubernetes-sigs/kube-scheduler-wasm-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kube-scheduler-wasm-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot commented 3 weeks ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dejanzele Once this PR has been reviewed and has the lgtm label, please assign sanposhiho for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
dejanzele commented 3 weeks ago

/cc @sanposhiho @Gekko0114 @utam0k

sanposhiho commented 2 weeks ago

It's very weird that the testdata CI is successful both in this PR and the master branch. Can you check that?

k8s-ci-robot commented 2 weeks ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
dejanzele commented 2 weeks ago

This PR is not needed anymore after https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/pull/127, and also performance seems a lot better in the bechmark