Closed zhanggbj closed 3 years ago
Hi @mattmoor @vagababov @grantr @maximilien @julz , thanks for the feedback and support, here is the repo request as we discussed. Thanks!
Thank you Grace @zhanggbj.
Can’t wait to have this in the hands of everyone and see where it goes and most importantly allow other members of the community to run performance tests on Knative as you and the IBM Beijing team have been doing.
Also, adding @evankanderson and @markusthoemmes and thanking them in advance for feedback and help 🙏🏽
/assign @mattmoor
My general sentiment: Given that mako.dev
is not a community accessible option, I would love to see us move towards something community accessible, which folks can run locally, and operate in their own downstreams. However, I think we need a clear objective to rationalize aspects of this with certain areas where it overlaps with things we have upstream.
On WG ownership: I am not sure that "Serving API WG" is the best home, since that is really just one of the stakeholders. Autoscaling (@vagababov) is probably the largest stakeholder today, but I haven't tracked Eventing's (@grantr) investment. In general, I would say that the horizontal nature of this is probably best matched by Productivity, which curates much of our infrastructure, including the performance automation we have today, but the infrastructure they operate today isn't accessible to folks outside of Google 😞 . So if @chaodaiG and @chizhg are unwilling to sponsor this, I would be happy to sponsor including this in sandbox.
Generally from the proposal it looks like there are several key elements of this proposal, and I wanted to highlight here where I think there are overlaps for us to rationalize:
Load Generation
For dataplane benchmarks today, we use vegeta as a library, which can run different shaped loads. Generally this has been able to more reliably generate a certain "requests-per-second" load than alternatives I've used like wrk
, wrk2
, and hey
.
For controlplane benchmarks, what we have today is actually really bad because none of the Serving API leads can actually run the benchmark against Mako or access the perf clusters to debug where it is blowing up.
Signals
Depending on the benchmark you look at, we've done a fair amount of instrumenting to extract pertinent metrics to overlay on the graph, and these would probably be relatively easy to incorporate into kperf as additional datapoints.
In the load-test benchmark we overlay:
In the deployment-probe (which is the "bad" controlplane benchmark mentioned above) we actually have some really good signals, if we can get the thing to run!
ksvc
reports Ready: true
(computed as Ready's lastTransitionTime
- creationTimestamp
)Ready: true
(same computation)Dashboarding
The overlap here is Mako, which is a closed system. We should identify the feature we want to keep, and figure out how to get those into kperf dashboard over time.
If this sounds like a reasonable path forward to folks, then my inclination would be to start setting things up while we sort out some of the final ownership questions above. @maximilien Sound good?
Thank you for bringing this up @zhanggbj and @mattmoor !
To provide a bit more context here, in the past few months, we had been considering different ways to improve the performance testing framework for Knative, but haven't started any work yet because of other priorities coming up. I think now it's the time to reconsider the priorities.
My two cents here:
Based on my experience, though the biggest (or maybe the only) problem for Mako is it's not accessible to developers outside of Google, it provides a lot of nice features like charting, automatic regression detection, data sampling, etc. We have used Mako to show some performance improvement charts, and catch a few performance regressions, so IMO Mako still has lots of value here.
While the long term goal for Mako is to make it accessible to non-Google developers (@timford might be able to provide more details), I think a better direction is for us to collaborate - we can improve the current performance testing framework in CI by still using Mako (e.g. supporting presubmit performance tests), and also make kperf
as part of the testing framework which could fill the gap for run-local.
cc @mrfaizal and @albertomilan for awareness.
@chizhg unless Mako becomes accessible to non-Googlers for benchmark development, it really doesn't belong as an upstream dependency. This isn't a criticism of its feature-set, but we simply cannot operate an open community where sections (especially ones as important as this) are walled off to non-Googlers.
Since the decision to stop pursuing an "open Mako" it really hasn't been a question of "if", but:
My main goal is to unblock non-Googler benchmark development, and having kperf accomplishes that. It's entirely plausible that test-infra can ingest that data coming out of kperf and still pipe it to Mako (just like testgrid, frankly), but regression detection is only as useful as the community's ability to reproduce it. 🤷
@mattmoor we had some discussions regarding this, and we all agree the way to moving forward is having kperf to generate the data and using Mako for charting and regression detection in the CI environment. Since the data produced by kperf is csv format and should be easy to parse, we can collaborate on defining the interface between kperf and Mako.
Ok, that sounds awesome. So can we think of Productivity as the sponsoring WG here?
If we're good with that, then I'd appreciate if someone would run thru the new self-serve steps for repo creation!
Sure, Productivity WG could be the sponsor here.
@zhanggbj There is a new self-service process for this, I'll inline the new instructions since the template changed since this was open, but please work with @chizhg to complete the steps.
UPDATE: I inlined the checklist in the top comment to avoid confusion
@mattmoor @chizhg Sure, thanks for the support and help! I'll take a look at the process and update here if any progress.
/unassign /assign @zhanggbj @chizhg
@chizhg @maximilien just raised a PR for review, FYI, thanks! Add kperf to the peribolos sandbox config #276
Repo is created ❤️ and will continue with other TODOs.
Hi @mattmoor @chizhg,
My PR (for the alias of import path) is merged, for the next step, would you please help to take a look? Thanks a lot!
appropriate "template" repository (basic, sample-controller, sample-source) to the new repository as a git remote.
Hi @zhanggbj , it looks this step does not apply to this repo. I think you can just skip it or start a PR that includes some basic code you already have.
@chizhg Looks like I do not have the permission to fork the repo or start a PR, do you happen to know where I can request the permission? Thanks!
The repo needs to be seeded with some content as a starting point, the WG leads have write access which is enough to bootstrap this.
@mattmoor @chizhg Thanks! If this is the case, would you please help to bootstrap it, and then I'll raise PR with the initial kperf code.
Also CC @maximilien
Ah I didn't know an empty repo could not be forked.. I have pushed a simple README.md to the repo, now you should be able to continue with the following steps.
@chizhg no worries, now it works well, thank you! I will continue to raise the initial PR and tag you all for a review.
I think this will need test-infra
to be set up as well so that Tide will merge things in the presence of an OWNERS file.
@zhanggbj also look at https://github.com/knative-sandbox/hack for seed files
test-infra
has been set up, we're still configuring codecov
Sample PR is https://github.com/knative-sandbox/kperf/pull/21, all prow passed now.
The next step could be
/CC @chizhg @evankanderson
I believe this is now done automatically daily, we should update the instructions!
@mattmoor That's nice, so we are done with this issue now?
tide
is already enabled in prow test for PR, so I'll mark this item done. We can reopen it if anything needed. Thanks all for helping on the kperf repo! Hooray!!!
cc @mattmoor @chizhg @maximilien ^^^^
Use this issue type to request a new repo in
knative-sandbox
(orknative
, which may require additional discussion).Repo information
Org: knative-sandbox
Repo: kperf
Purpose (Description): As discussed on Serving WG meeting on 2020-08-05, I would like to request a repo for
kperf
to continue the work of the benchmarking tool for Knative.Briefly, kperf is designed to help knative developers or operators to get better knowledge about a Knative platform's scalability and performance. This tool will generate specific Knative load, like a Knative Service with different workload like intervals, create concurrency and etc on demand. It will take a measure for all the Services and give both a human-readable output for fine-grained Knative resources ready durations and csv files for all the raw data. It also provides a dashboard to help us to analyze the measurement, drill-down and locate issues. In this way, kperf will help Knative adopters to run specific load tests against the Knative platform, dig into underneath Knative resources, know the boundary and locate scalability and performance issues.
Sponsoring WG: Serving WG
Actions to fulfill
This area is used for the TOC to track the repo creation process
[x] Add this issue to the TOC project board for review.
[x] Send a PR adding entries for this repo in
/peribolos/knative-sandbox.yaml
Knative Admin
theadmin
privilege.write
privilege. PR mergedOnce the TOC has approved the above, it will merge and Peribolos will create an empty repository.
[x] (golang) Send a PR to add aliases for
knative.dev/$REPONAME
import paths (sample). PR merged[x] Have a lead from the sponsoring WG bootstrap the Git repository by pushing an appropriate "template" repository (basic, sample-controller, sample-source) to the new repository as a git remote. For example:
[x] Set up test-infra following the docs linked at the beginning.
[x] Create a sample PR to verify Prow (e.g. edit the boilerplate README)
Once Prow has been verified.
tide
is a required presubmit check.