knative / community

Knative governance and community material.
https://knative.dev/community
Other
251 stars 233 forks source link

New Repo: knative-sandbox/kperf #211

Closed zhanggbj closed 3 years ago

zhanggbj commented 4 years ago

Use this issue type to request a new repo in knative-sandbox (or knative, which may require additional discussion).

Repo information

Org: knative-sandbox

Repo: kperf

Purpose (Description): As discussed on Serving WG meeting on 2020-08-05, I would like to request a repo for kperf to continue the work of the benchmarking tool for Knative.

Briefly, kperf is designed to help knative developers or operators to get better knowledge about a Knative platform's scalability and performance. This tool will generate specific Knative load, like a Knative Service with different workload like intervals, create concurrency and etc on demand. It will take a measure for all the Services and give both a human-readable output for fine-grained Knative resources ready durations and csv files for all the raw data. It also provides a dashboard to help us to analyze the measurement, drill-down and locate issues. In this way, kperf will help Knative adopters to run specific load tests against the Knative platform, dig into underneath Knative resources, know the boundary and locate scalability and performance issues.

Sponsoring WG: Serving WG

Actions to fulfill

This area is used for the TOC to track the repo creation process

Once the TOC has approved the above, it will merge and Peribolos will create an empty repository.

Once Prow has been verified.

zhanggbj commented 4 years ago

Hi @mattmoor @vagababov @grantr @maximilien @julz , thanks for the feedback and support, here is the repo request as we discussed. Thanks!

maximilien commented 4 years ago

Thank you Grace @zhanggbj.

Can’t wait to have this in the hands of everyone and see where it goes and most importantly allow other members of the community to run performance tests on Knative as you and the IBM Beijing team have been doing.

maximilien commented 4 years ago

Also, adding @evankanderson and @markusthoemmes and thanking them in advance for feedback and help 🙏🏽

evankanderson commented 4 years ago

/assign @mattmoor

mattmoor commented 4 years ago

My general sentiment: Given that mako.dev is not a community accessible option, I would love to see us move towards something community accessible, which folks can run locally, and operate in their own downstreams. However, I think we need a clear objective to rationalize aspects of this with certain areas where it overlaps with things we have upstream.


On WG ownership: I am not sure that "Serving API WG" is the best home, since that is really just one of the stakeholders. Autoscaling (@vagababov) is probably the largest stakeholder today, but I haven't tracked Eventing's (@grantr) investment. In general, I would say that the horizontal nature of this is probably best matched by Productivity, which curates much of our infrastructure, including the performance automation we have today, but the infrastructure they operate today isn't accessible to folks outside of Google 😞 . So if @chaodaiG and @chizhg are unwilling to sponsor this, I would be happy to sponsor including this in sandbox.


Generally from the proposal it looks like there are several key elements of this proposal, and I wanted to highlight here where I think there are overlaps for us to rationalize:

Load Generation

For dataplane benchmarks today, we use vegeta as a library, which can run different shaped loads. Generally this has been able to more reliably generate a certain "requests-per-second" load than alternatives I've used like wrk, wrk2, and hey.

For controlplane benchmarks, what we have today is actually really bad because none of the Serving API leads can actually run the benchmark against Mako or access the perf clusters to debug where it is blowing up.

Signals

Depending on the benchmark you look at, we've done a fair amount of instrumenting to extract pertinent metrics to overlay on the graph, and these would probably be relatively easy to incorporate into kperf as additional datapoints.

In the load-test benchmark we overlay:

In the deployment-probe (which is the "bad" controlplane benchmark mentioned above) we actually have some really good signals, if we can get the thing to run!

Dashboarding

The overlap here is Mako, which is a closed system. We should identify the feature we want to keep, and figure out how to get those into kperf dashboard over time.


If this sounds like a reasonable path forward to folks, then my inclination would be to start setting things up while we sort out some of the final ownership questions above. @maximilien Sound good?

chizhg commented 4 years ago

Thank you for bringing this up @zhanggbj and @mattmoor !

To provide a bit more context here, in the past few months, we had been considering different ways to improve the performance testing framework for Knative, but haven't started any work yet because of other priorities coming up. I think now it's the time to reconsider the priorities.

My two cents here: Based on my experience, though the biggest (or maybe the only) problem for Mako is it's not accessible to developers outside of Google, it provides a lot of nice features like charting, automatic regression detection, data sampling, etc. We have used Mako to show some performance improvement charts, and catch a few performance regressions, so IMO Mako still has lots of value here. While the long term goal for Mako is to make it accessible to non-Google developers (@timford might be able to provide more details), I think a better direction is for us to collaborate - we can improve the current performance testing framework in CI by still using Mako (e.g. supporting presubmit performance tests), and also make kperf as part of the testing framework which could fill the gap for run-local.

cc @mrfaizal and @albertomilan for awareness.

mattmoor commented 4 years ago

@chizhg unless Mako becomes accessible to non-Googlers for benchmark development, it really doesn't belong as an upstream dependency. This isn't a criticism of its feature-set, but we simply cannot operate an open community where sections (especially ones as important as this) are walled off to non-Googlers.

Since the decision to stop pursuing an "open Mako" it really hasn't been a question of "if", but:

My main goal is to unblock non-Googler benchmark development, and having kperf accomplishes that. It's entirely plausible that test-infra can ingest that data coming out of kperf and still pipe it to Mako (just like testgrid, frankly), but regression detection is only as useful as the community's ability to reproduce it. 🤷

chizhg commented 4 years ago

@mattmoor we had some discussions regarding this, and we all agree the way to moving forward is having kperf to generate the data and using Mako for charting and regression detection in the CI environment. Since the data produced by kperf is csv format and should be easy to parse, we can collaborate on defining the interface between kperf and Mako.

mattmoor commented 4 years ago

Ok, that sounds awesome. So can we think of Productivity as the sponsoring WG here?

If we're good with that, then I'd appreciate if someone would run thru the new self-serve steps for repo creation!

chizhg commented 4 years ago

Sure, Productivity WG could be the sponsor here.

mattmoor commented 4 years ago

@zhanggbj There is a new self-service process for this, I'll inline the new instructions since the template changed since this was open, but please work with @chizhg to complete the steps.


UPDATE: I inlined the checklist in the top comment to avoid confusion

zhanggbj commented 4 years ago

@mattmoor @chizhg Sure, thanks for the support and help! I'll take a look at the process and update here if any progress.

mattmoor commented 4 years ago

/unassign /assign @zhanggbj @chizhg

zhanggbj commented 4 years ago

@chizhg @maximilien just raised a PR for review, FYI, thanks! Add kperf to the peribolos sandbox config #276

zhanggbj commented 4 years ago

Repo is created ❤️ and will continue with other TODOs.

https://github.com/knative-sandbox/kperf

zhanggbj commented 4 years ago

Hi @mattmoor @chizhg,

My PR (for the alias of import path) is merged, for the next step, would you please help to take a look? Thanks a lot!

appropriate "template" repository (basic, sample-controller, sample-source) to the new repository as a git remote.

chizhg commented 4 years ago

Hi @zhanggbj , it looks this step does not apply to this repo. I think you can just skip it or start a PR that includes some basic code you already have.

zhanggbj commented 4 years ago

@chizhg Looks like I do not have the permission to fork the repo or start a PR, do you happen to know where I can request the permission? Thanks!

Screen Shot 2020-09-27 at 10 45 22
mattmoor commented 4 years ago

The repo needs to be seeded with some content as a starting point, the WG leads have write access which is enough to bootstrap this.

zhanggbj commented 4 years ago

@mattmoor @chizhg Thanks! If this is the case, would you please help to bootstrap it, and then I'll raise PR with the initial kperf code.

Also CC @maximilien

chizhg commented 4 years ago

Ah I didn't know an empty repo could not be forked.. I have pushed a simple README.md to the repo, now you should be able to continue with the following steps.

zhanggbj commented 4 years ago

@chizhg no worries, now it works well, thank you! I will continue to raise the initial PR and tag you all for a review.

evankanderson commented 4 years ago

I think this will need test-infra to be set up as well so that Tide will merge things in the presence of an OWNERS file.

maximilien commented 3 years ago

@zhanggbj also look at https://github.com/knative-sandbox/hack for seed files

zhanggbj commented 3 years ago

test-infra has been set up, we're still configuring codecov Sample PR is https://github.com/knative-sandbox/kperf/pull/21, all prow passed now.

The next step could be

/CC @chizhg @evankanderson

mattmoor commented 3 years ago

I believe this is now done automatically daily, we should update the instructions!

zhanggbj commented 3 years ago

@mattmoor That's nice, so we are done with this issue now?

zhanggbj commented 3 years ago

tide is already enabled in prow test for PR, so I'll mark this item done. We can reopen it if anything needed. Thanks all for helping on the kperf repo! Hooray!!!

cc @mattmoor @chizhg @maximilien ^^^^