knative / community

Knative governance and community material.
https://knative.dev/community
Other
245 stars 233 forks source link

Proposal to adopt Skenario into Knative #18

Closed jchesterpivotal closed 4 years ago

jchesterpivotal commented 5 years ago

Objective

Pivotal proposes to donate Skenario to the Knative project.

Our motivations are:

Background

Skenario is a simulator harness developed to provide feedback on the Knative Pod Autoscaler (KPA). It was inspired by experiences of developing a simulator for the original Project riff autoscaler (before Knative Serving was adopted as a foundation layer). Skenario provides a general purpose simulator engine for executing scenarios and collecting results, plus a model built on that engine which drives the KPA.

It provides a CLI trace interface for debugging and a web GUI which shows graphical displays of autoscaler behaviour. The web GUI enables rapid exploratory examination of hypotheses about autoscaler behaviour.

Further history and motivation can be found in the original issue in the Knative Serving repo. Also available is a detailed discussion of the simulation concepts underlying Skenario.

Currently

Skenario is able to drive the 0.5 release of the KPA. Driving 0.6 and up is blocked pending further investigations of changes needed in the KPA to ease integration.

Skenario is worked on by a single Pivotal engineer (@jchesterpivotal) on a best-effort basis, ramped down from a fulltime effort.

There are seven open issues.

Future directions

Use in "headless" usecases

While currently oriented towards interactive, exploratory use by developers, Skenario could be adapted for automated use as well. It could be used in test suites to predict performance regressions before testing. It could also be used as the evaluation function for optimisation tools searching an increasingly large parameter space.

Application to other problems

As noted above, Skenario's design separates the simulator engine from the simulation model. This design means it could be applied to other simulation problems of interest. We currently foresee two additional applications for Skenario.

Development of other autoscalers

We believe that a model can be developed to drive the Kubernetes Horizontal Pod Autoscaler. It's likely that parts of this model could be shared with the KPA model, especially as simulation elements for nodes and image placement are added.

Development of systems using Knative Eventing

Skenarios central concepts of Stocks and Movements appears to be suited to simulating the behaviour of Eventing systems under various conditions. This application was held in mind when designing Skenario. It would require additional engineering so that constructing models does not require much, if any, custom code to be written.

The goal of such a use would be to enable application developers to understand the dynamic behaviour of Eventing systems in general; and to identify potential bad behaviour or optimisation opportunities in particular.

Process of donation and adoption

Skenario would fall under the scope of the Scaling WG. An initial OWNERS file would list Pivotal, but I expect other names will be added (eg. Google) as contributions are accepted.

Checklist

  1. [x] Approval of donation
    1. [x] Relevant stakeholders at Pivotal have agreed to donation.
  2. [x] Notification of proposal
    1. [x] Submit proposal to knative/docs repo
    2. [x] Email notification of proposal sent to knative-users and knative-dev
    3. [x] Discussion of proposal in Autoscaling WG
  3. [ ] Approval of adoption
    1. [ ] Technical Oversight Committee approval to adopt Skenario as a sub-project (as per its charter).
  4. [ ] Repository relocation
    1. [x] Apache License 2.0 applied to code
    2. [ ] Creative Commons License 4.0 applied to documentation
    3. [ ] Remove Pivotal-standard CONTRIBUTING.md file
    4. [ ] Relocate the repository from pivotal/skenario to knative/skenario
    5. [ ] Create OWNERS file
    6. [ ] Create AUTHORS file
    7. [ ] Update copyright header from Copyright (C) 2019-Present Pivotal Software, Inc. to Copyright (C) 2019 The Knative Authors.
    8. [ ] Verify that all contributors are Google CLA signatories
    9. [ ] Verify that all contributors are CNCF CLA signatories
    10. [ ] Adopt Google CLA bot
jchesterpivotal commented 5 years ago

cc: @josephburnett @markusthoemmes (Autoscaler leads) @mattmoor @vaikas-google @evankanderson (TOC)

mattmoor commented 5 years ago

@jchesterpivotal Given it's proximity, do you see this falling under the scope of the Autoscaling WG to maintain (and as needed release)?

jchesterpivotal commented 5 years ago

@mattmoor Yes, I see this as falling within Autoscaler WG's scope.

evankanderson commented 5 years ago

@rgregg who has been looking at our policy in terms of adopting / managing repos.

abrennan89 commented 5 years ago

Does this require changes to knative.dev/docs to include information about this?

cc @samodell in case this needs to be on your roadmap

jchesterpivotal commented 5 years ago

@abrennan89 I don't think an immediate change to docs would be necessary.

josephburnett commented 5 years ago

:+1: to accepting this into Knative. It's been useful already in talking about autoscaling behavior. E.g. cliff behavior when exiting panic mode.

image

And for considering the impact of changes. E.g. what happens if we increase scale-up-rate to 1000x?

image

josephburnett commented 5 years ago

Regarding supporting Skenario, after I have been replaced as Scaling WG Lead, I would like to dedicate my 20% Knative time to supporting this simulator. In particular, I would like to extend it to simulate the Kubernetes HPA. This will help to evaluate the KPA-HPA feature gap. But it will also help with my day job of supporting the HPA. There are some complex changes coming down the pipeline (kep) and I would like to have tool support for thinking about them.

csantanapr commented 5 years ago

@josephburnett thanks for the screenshots very useful, I was asking today how this tool looks like

jchesterpivotal commented 5 years ago

A note for those following along at home: the process is pending the next quorate TOC meeting, which is expected to be on 11th July.

evankanderson commented 5 years ago

I'm in favor of adopting this, but I believe that Ryan has some mechanical changes to the acceptance process that he wanted to propose in the steering committee.

From the TOC perspective, it seems like we have at least 2-3 contributors, and may have more if adopted into the org. What would the OWNERS file look like, and what WG would discussion and participation happen in? (I think I know the answers, but these should probably be spelled out in the proposal.)

mattmoor commented 5 years ago

Skenario is able to drive the 0.5 release of the KPA. Driving 0.6 and up is blocked pending further investigations of changes needed in the KPA to ease integration.

This and how we staff this are two of my major concerns (incl. things like test-infra support overhead). I'd like to better understand our long-term story for keeping this working, ideally with a relatively low overhead to ongoing work on the autoscaler itself.

Given the pace of change in the autoscaler, my concern is that 0.5 -> 0.7 is a huge delta that will grow faster than it closes, and that for this to provide maximum value it needs to track head.

We don't really have a "sandbox" or "incubation" designation, but we've discussed it in the past. I'd be inclined to support it's inclusion with this sort of probationary status, to see if we can realistically track HEAD.

On the topic of release cadence, I am curious if/what artifacts we expect this repo to produce? Would we create binary releases of the simulator each cycle?

jchesterpivotal commented 5 years ago

@evankanderson

From the TOC perspective, it seems like we have at least 2-3 contributors, and may have more if adopted into the org. What would the OWNERS file look like, and what WG would discussion and participation happen in? (I think I know the answers, but these should probably be spelled out in the proposal.)

Updated.

@mattmoor

This and how we staff this are two of my major concerns (incl. things like test-infra support overhead). ... Given the pace of change in the autoscaler, my concern is that 0.5 -> 0.7 is a huge delta that will grow faster than it closes, and that for this to provide maximum value it needs to track head.

I'm concerned too. There's a chicken-and-egg situation for keeping Skenario in sync with the autoscaler. I fell behind because I was focused on development of Skenario-as-Skenario and needed a stable target. Assuming it's adopted, I expect that on catching up it will become easier to stay in sync because Skenario would be an at least optional check on activity.

On the topic of release cadence, I am curious if/what artifacts we expect this repo to produce? Would we create binary releases of the simulator each cycle?

I can see compiled binaries becoming a possibility, co-released with the rest of Serving.

Another place I would expect it to become more prominent is when it becomes possible to run fully headlessly. It would then be a source of data for performance prediction that can be collected over a number of versions of Serving.

josephburnett commented 5 years ago

what artifacts we expect this repo to produce

And I would expect to have some automated deployment to a demo site where people can play around with the simulation. So it should produce a container at least.

jchesterpivotal commented 5 years ago

Update: this was discussed in today's Technical Oversight Committee meeting.

No vote as yet. The Steering Committee will discuss what other requirements might be necessary before proceeding and loop back.

jchesterpivotal commented 4 years ago

Hi everyone,

After talking with @josephburnett, we've come to the view that Skenario ought to chart an independent destiny.

So, to avoid confusion, I'm withdrawing this proposal. The withdrawal is without prejudice and is not meant as a sign or symbol or anything of the sort. It's just a practical step towards where Skenario goes next.

I want to thank everyone who was involved in the discussions around the donation proposal and I'm pleased it acted as a small piece of our continued evolution as an open community.