kubernetes-sigs / kube-scheduler-wasm-extension

All the things to make the scheduler extendable with wasm.
Apache License 2.0
110 stars 22 forks source link

Replace the default wasm garbage collector with nottinygc #25

Closed codefromthecrypt closed 1 year ago

codefromthecrypt commented 1 year ago

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

After profiling in #24, I noticed over half the time in wasm was spent in garbage collection. This could have been guessed, but it was nice to verify it with wzprof.

This uses an optimized garbage collector, nottinygc, dramatically improving performance.

Which issue(s) this PR fixes:

NONE

Special notes for your reviewer:

https://github.com/wasilibs/nottinygc was invented by the excellent @anuraaga

Does this PR introduce a user-facing change?

NONE

What are the benchmark results of this change?

same code takes half the time!

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e
                                            │  before.txt   │             after.txt              │
                                            │    sec/op     │   sec/op     vs base               │
PluginFilter/noop/params:_small-12             148.6n ±  2%   143.8n ± 1%   -3.20% (p=0.002 n=6)
PluginFilter/noop/params:_real-12              146.5n ±  0%   143.2n ± 0%   -2.25% (p=0.002 n=6)
PluginFilter/filter-simple/params:_small-12   14.436µ ±  5%   7.234µ ± 2%  -49.89% (p=0.002 n=6)
PluginFilter/filter-simple/params:_real-12     232.5µ ± 21%   121.1µ ± 1%  -47.91% (p=0.002 n=6)
geomean                                        2.923µ         2.061µ       -29.50%
codefromthecrypt commented 1 year ago

I should confess one downside, which is that the wasm is significantly larger (+110KB) However, the performance was quite bad and I think this is an easy tradeoff.

codefromthecrypt commented 1 year ago

@anuraaga had a good idea about a start panic instead. maybe hold off and we can use a version that has an instructive error message even if still SDK users should copy/pasta the args to tinygo (or use a supplied script to invoke it)

anuraaga commented 1 year ago

Updated nottinygc to print a grokkable message when the TinyGo flags haven't been set

https://github.com/wasilibs/nottinygc/releases/tag/v0.3.0

codefromthecrypt commented 1 year ago

Thanks @anuraaga I updated like so

$ for I in `find . -name go.mod -exec egrep -q nottinygc {} \; -print`; do (cd `dirname $I`; go get  github.com/wasilibs/nottinygc@v0.3.0; go mod tidy); done
codefromthecrypt commented 1 year ago

so here's the current error if the plugin wasn't built with the flags:

    plugin_perf_test.go:38: failed to create plugin: wasm: error instantiating guest: module[filter-simple-1] function[_start] failed: wasm error: unreachable
        wasm stack trace:
            .runtime._panic(i32,i32)
            ._start.command_export()
anuraaga commented 1 year ago

@codefromthecrypt Oh, I wonder if printing the error message of a panic would require stderr/stdout to be available. That might make the approach not very applicable...

codefromthecrypt commented 1 year ago

would require stderr/stdout to be available

this might be ok. lemme wire up and see

codefromthecrypt commented 1 year ago

@anuraaga I changed to prioritize stdout/stderr over a generic stack trace. If looks good I'll try to add a test of some kind:

failed to create plugin: wasm: error instantiating guest: panic: nottinygc requires passing -gc=custom and -tags=custommalloc to TinyGo when compiling.
sanposhiho commented 1 year ago

@codefromthecrypt

I should confess one downside, which is that the wasm is significantly larger (+110KB) However, the performance was quite bad and I think this is an easy tradeoff. https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/pull/25#issuecomment-1569889449

In the server-side wasm, what gonna be the problem/disadvantage of a bigger wasm size? The performance to load the wasm in the host side?

codefromthecrypt commented 1 year ago

In the server-side wasm, what gonna be the problem/disadvantage of a bigger wasm size? The performance to load the wasm in the host side?

One reason is I think some people move wasm around with config maps? cc @salaboy about any size constraints there, in case folks here aren't aware of the nuance. Basically, we may be 110kb closer to that limit. Once past that limit, we need to use http refs like most proxy-wasm are moved around by istio (OCI is another way, such as what trivy does).

salaboy commented 1 year ago

Yeah configMaps have size restrictions, but also you are storing those files in etcd, which was not designed for storing files (in the context of kubernetes) , saving wasm files in volumes is the way to go

On Sat, 3 Jun 2023 at 06:59, Crypt Keeper @.***> wrote:

In the server-side wasm, what gonna be the problem/disadvantage of a bigger wasm size? The performance to load the wasm in the host side?

One reason is I think some people move wasm around with config maps? cc @salaboy https://github.com/salaboy about any size constraints there, in case folks here aren't aware of the nuance. Basically, we may be 110kb closer to that limit. Once past that limit, we need to use http refs like most proxy-wasm are moved around by istio (OCI is another way, such as what trivy does).

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/pull/25#issuecomment-1574658951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACCMXXVQIG4DWIDDSNM4DLXJLAERANCNFSM6AAAAAAYVGRKSE . You are receiving this because you were mentioned.Message ID: @.*** com>

--

codefromthecrypt commented 1 year ago

I'll go ahead and add the unit test here and also add an issue for docs on how to handle wasm distribution

codefromthecrypt commented 1 year ago

ok I backfilled tests for panic messages PTAL!

codefromthecrypt commented 1 year ago

rebased

codefromthecrypt commented 1 year ago

added a make testdata target which makes all the test wasm

kerthcet commented 1 year ago

/lgtm /approve /hold

Free free to unhold this.

codefromthecrypt commented 1 year ago

I will add a RATIONALE section then ping @sanposhiho for final review

codefromthecrypt commented 1 year ago

PTAL I added the RATIONALE.md cc also @anuraaga to keep me honest if I mentioned pros/cons correctly!

k8s-ci-robot commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anuraaga, codefromthecrypt, kerthcet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/blob/main/OWNERS)~~ [kerthcet] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
codefromthecrypt commented 1 year ago

squashed

kerthcet commented 1 year ago

/hold cancel The RATIONALE.md is excellent.

kerthcet commented 1 year ago

/lgtm