Closed sanposhiho closed 6 months ago
in the http-wasm, we had to make the guest scoped to a request/response cycle, even if it is re-used after that. I think with some thinking, we can pin a guest to cycle state if that helps make things more efficient or less complex.
think with some thinking, we can pin a guest to cycle state if that helps make things more efficient or less complex.
I have a similar thought. https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/pull/6#discussion_r1207996051
The preemption is the only place where the same plugin is called for the same scheduling cycle in parallel. (talking with my memory, need to double-check) So, in other places, we should be able to basically use the same instance for the same scheduling cycle.
ok so if I understand, there are two concerns:
So, I think that to solve this, we need to:
PreFilter
, release back on Permit
)Is this correct?
Yes, correct. That is mostly the same as what I'm thinking.
pin an instance to the scheduling cycle. (take a module instance from pool on PreFilter, release back on Permit)
But, Permit may not be called actually, for example, when all Nodes are filtered out in Filter phase. So, we can simplify it by taking an instance from pool on PreFilter and then if PreFilter is called with a different Pod later, we can assume the past scheduling cycle was finished and release back an instance.
/assign
Also, pin an instance to the scheduling cycle allows us to implement cache for several things on the guest. The k8s resources shouldn't get changed during the scheduling cycle; we can cache them.
But, Permit may not be called actually, for example, when all Nodes are filtered out in Filter phase. So, we can simplify it by taking an instance from pool on PreFilter and then if PreFilter is called with a different Pod later, we can assume the past scheduling cycle was finished and release back an instance.
I don't fully understand how this will work, because something pinned to a scheduling cycle won't be in the pool anymore. It sounds like we are back to one-at-a-time use, not pooling.
Do you mean we have a shared instance with state about the pod it is working on, and when that doesn't match the pod arg, we know to invalidate data? Or do you mean to still have a pool, but also need a queue of in-flight instances, which we double-check later if orphaned due to not reaching Permit stage?
Forgot to consider the binding cycle in my previous comment. The cycle state is passed from the scheduling cycle to the binding cycle, so we need to use the same instance in the binding cycle as well.
You may have already understood something by looking at #21, but I explain about the implementation detail a bit here so that someone can look back this later.
Let's see through some scenarios:
For example, all Nodes are filtered out in Filter extension point.
Note: the binding cycle is executed in parallel.
4a. The binding cycle is started for PodA. In all extension points in the binding cycle, the wasm plugin uses the instance assigned to PodA. (Actually we haven't yet supported any extension points in the binding cycle though) 5a. The binding cycle is finished and PostBind is executed. The wasm plugin unassign the instance from PodA, and put this instance to the pool. 4b. (PodB's scheduling gets started.) 5b. PreFilter is executed for PodB. The wasm plugin does not unassign the instance from PodA because it knows that PodA is moved to the binding cycle and the instance is still in use. It does the same as (2) for PodB.
(After (3), 4a -> 5a and 4b -> 5b are executed in parallel.)
No much except the preemption. As said, in preemption, Filter
, AddPod
, and RemovePod
may be executed for the same Pod in parallel. So, we need to take a lock during Filter
, AddPod
, and RemovePod
so that the same instance won't be executed from the different goroutine at the same time.
thanks, looks like we have some test cases to write! thanks for saying what they should be. I can help with these once the code is in shape.
I think we can support preemption later, not that urgent, wdyt? @sanposhiho This is similar to our other plugin implementations like VolumeRestrictions.ReadWriteOncePod
, we support the preemption when graduating to GA.
A basic question about host-guest mode, if we have multi plugins, and each plugin has implemented different extension points
cc @codefromthecrypt @sanposhiho
I'm unclear about the current status on the preemption, but the current #21 has already supported the preemption by taking lock during Filter, AddPod, and RemovePod. So, that should be working with preemption now.
if we have multi plugins, and each plugin has implemented different extension points
You mean multi wasm plugins, right? We don't yet support them. (we should support them later though)
Basically, currently only one plugin, which hosts the guest instance, is used in the scheduler, and that plugin can load only one .wasm
.
I'm unclear about the current status on the preemption, but the current https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/pull/21 has already supported the preemption by taking lock during Filter, AddPod, and RemovePod. So, that should be working with preemption now.
Fine, I'll take a look.
You mean multi wasm plugins, right? We don't yet support them.
Good to start. If we have a vague design how it looks like, it would be great, just in case back and forth.
Yep, create the issue to remember that: https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension/issues/23
/close
cycle state is already supported
What would you like to be added:
Support CycleState in guest.
Why is this needed:
The cycle state is one of the important struct passed to each extension point, which is initialized at the beginning of each scheduling. And we use it as a way to pass something from one extension point to another. One common usecase is pre-calculate something in PreFilter or PreScore and use pre-calculation result in Filter or Score. (PreFilter/PreScore is called once per one scheduling, but Filter/Score is called for every potential Nodes. That's why we want to do precalculation instead of calculating every time Filter/Score is called) https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#extension-points
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.