kubernetes-sigs / kube-scheduler-wasm-extension

All the things to make the scheduler extendable with wasm.
Apache License 2.0
86 stars 17 forks source link

Replace the default wasm garbage collector with nottinygc #25

Closed codefromthecrypt closed 11 months ago

codefromthecrypt commented 11 months ago

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

After profiling in #24, I noticed over half the time in wasm was spent in garbage collection. This could have been guessed, but it was nice to verify it with wzprof.

This uses an optimized garbage collector, nottinygc, dramatically improving performance.

Which issue(s) this PR fixes:

NONE

Special notes for your reviewer:

https://github.com/wasilibs/nottinygc was invented by the excellent @anuraaga

Does this PR introduce a user-facing change?

NONE

What are the benchmark results of this change?

same code takes half the time!

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/internal/e2e
                                            │  before.txt   │             after.txt              │
                                            │    sec/op     │   sec/op     vs base               │
PluginFilter/noop/params:_small-12             148.6n ±  2%   143.8n ± 1%   -3.20% (p=0.002 n=6)
PluginFilter/noop/params:_real-12              146.5n ±  0%   143.2n ± 0%   -2.25% (p=0.002 n=6)
PluginFilter/filter-simple/params:_small-12   14.436µ ±  5%   7.234µ ± 2%  -49.89% (p=0.002 n=6)
PluginFilter/filter-simple/params:_real-12     232.5µ ± 21%   121.1µ ± 1%  -47.91% (p=0.002 n=6)
geomean                                        2.923µ         2.061µ       -29.50%
codefromthecrypt commented 11 months ago

I should confess one downside, which is that the wasm is significantly larger (+110KB) However, the performance was quite bad and I think this is an easy tradeoff.

codefromthecrypt commented 11 months ago

@anuraaga had a good idea about a start panic instead. maybe hold off and we can use a version that has an instructive error message even if still SDK users should copy/pasta the args to tinygo (or use a supplied script to invoke it)

anuraaga commented 11 months ago

Updated nottinygc to print a grokkable message when the TinyGo flags haven't been set

https://github.com/wasilibs/nottinygc/releases/tag/v0.3.0

codefromthecrypt commented 11 months ago

Thanks @anuraaga I updated like so

$ for I in `find . -name go.mod -exec egrep -q nottinygc {} \; -print`; do (cd `dirname $I`; go get  github.com/wasilibs/nottinygc@v0.3.0; go mod tidy); done
codefromthecrypt commented 11 months ago

so here's the current error if the plugin wasn't built with the flags:

    plugin_perf_test.go:38: failed to create plugin: wasm: error instantiating guest: module[filter-simple-1] function[_start] failed: wasm error: unreachable
        wasm stack trace:
            .runtime._panic(i32,i32)
            ._start.command_export()
anuraaga commented 11 months ago

@codefromthecrypt Oh, I wonder if printing the error message of a panic would require stderr/stdout to be available. That might make the approach not very applicable...

codefromthecrypt commented 11 months ago

would require stderr/stdout to be available

this might be ok. lemme wire up and see

codefromthecrypt commented 11 months ago

@anuraaga I changed to prioritize stdout/stderr over a generic stack trace. If looks good I'll try to add a test of some kind:

failed to create plugin: wasm: error instantiating guest: panic: nottinygc requires passing -gc=custom and -tags=custommalloc to TinyGo when compiling.
sanposhiho commented 11 months ago

@codefromthecrypt

I should confess one downside, which is that the wasm is significantly larger (+110KB) However, the performance was quite bad and I think this is an easy tradeoff. https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/pull/25#issuecomment-1569889449

In the server-side wasm, what gonna be the problem/disadvantage of a bigger wasm size? The performance to load the wasm in the host side?

codefromthecrypt commented 11 months ago

In the server-side wasm, what gonna be the problem/disadvantage of a bigger wasm size? The performance to load the wasm in the host side?

One reason is I think some people move wasm around with config maps? cc @salaboy about any size constraints there, in case folks here aren't aware of the nuance. Basically, we may be 110kb closer to that limit. Once past that limit, we need to use http refs like most proxy-wasm are moved around by istio (OCI is another way, such as what trivy does).

salaboy commented 11 months ago

Yeah configMaps have size restrictions, but also you are storing those files in etcd, which was not designed for storing files (in the context of kubernetes) , saving wasm files in volumes is the way to go

On Sat, 3 Jun 2023 at 06:59, Crypt Keeper @.***> wrote:

In the server-side wasm, what gonna be the problem/disadvantage of a bigger wasm size? The performance to load the wasm in the host side?

One reason is I think some people move wasm around with config maps? cc @salaboy https://github.com/salaboy about any size constraints there, in case folks here aren't aware of the nuance. Basically, we may be 110kb closer to that limit. Once past that limit, we need to use http refs like most proxy-wasm are moved around by istio (OCI is another way, such as what trivy does).

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/pull/25#issuecomment-1574658951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACCMXXVQIG4DWIDDSNM4DLXJLAERANCNFSM6AAAAAAYVGRKSE . You are receiving this because you were mentioned.Message ID: @.*** com>

--

codefromthecrypt commented 11 months ago

I'll go ahead and add the unit test here and also add an issue for docs on how to handle wasm distribution

codefromthecrypt commented 11 months ago

ok I backfilled tests for panic messages PTAL!

codefromthecrypt commented 11 months ago

rebased

codefromthecrypt commented 11 months ago

added a make testdata target which makes all the test wasm

kerthcet commented 11 months ago

/lgtm /approve /hold

Free free to unhold this.

codefromthecrypt commented 11 months ago

I will add a RATIONALE section then ping @sanposhiho for final review

codefromthecrypt commented 11 months ago

PTAL I added the RATIONALE.md cc also @anuraaga to keep me honest if I mentioned pros/cons correctly!

k8s-ci-robot commented 11 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anuraaga, codefromthecrypt, kerthcet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/blob/main/OWNERS)~~ [kerthcet] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
codefromthecrypt commented 11 months ago

squashed

kerthcet commented 11 months ago

/hold cancel The RATIONALE.md is excellent.

kerthcet commented 11 months ago

/lgtm