des-esseintes commented 4 years ago

Ask your question here:

Hello, wonderful Knative community! Thanks for all your effort, I'm a new user who's fascinated by the project you've created. However, I could not find answer to the question: is it possible to use cookies when splitting traffic to be sure that the same user will always be directed to the same revision? I know this feature exists in bare Istio, Gloo, ..., but could not find any info how to make use of it in Knative, and whether it is even possible or not. Thanks!

(i wasnt sure which area to chose, so i left it blank, sorry for the inconvenience)

vagababov commented 4 years ago

/cc @tcnghia @ZhiminXiang @nak3

ZhiminXiang commented 4 years ago

@des-esseintes currently Knative does not support session affinity / sticky sessions/ cookie based routing. To workaround this, you could try to add your own resources (e.g. VirtualService for Istio) to route traffic to revision services based on your requirements.

des-esseintes commented 4 years ago

i see, thanks, @ZhiminXiang! any plans to make this 'native' to Knative? :) just wondering.

rhuss commented 3 years ago

I have the same question, because I just got the same question :)

I think it is a very important use case to allow revision stickiness : Stick to a revision once selected when doing a traffic split based on probability.

rhuss commented 3 years ago

Tag header-based routing could be half of the solution. If now an HTTP response from a revision would now include its revision-tag, then a client could perfectly do the correlation on its own:

Send first requests to the service URL without any extra info
Traffic split selects a revision by distribution
HTTP response from this revision contains the tag as specified in the traffic split
Client picks up that response header field and sends it back on all future requests.

That way as soon as a revision has been selected, the client is able to stick to this revision.

tcnghia commented 3 years ago

That is an interesting proposal @rhuss. Adding a response header is probably easy, but it does require expanding our runtime contract.

cc @mattmoor @evankanderson @dprotaso

dprotaso commented 3 years ago

I was under the impression tag header based routing only supports 'tags' that are explicitly marked in the Route.

mattmoor commented 3 years ago

This is similar to: https://github.com/knative/serving/issues/9039

dprotaso commented 3 years ago

Cross post my comment from #9039

Given the failure modes I don't understand why you wouldn't want to move your session state persistence to some external service - ie. memcache, redis, apache gemfire

ie. if you're using spring there's tooling that abstracts this https://spring.io/projects/spring-session-data-redis https://spring.io/projects/spring-session-data-geode

@des-esseintes can you clarify why you want to hit a specific revision. I'm assuming it has some in-memory session data.

des-esseintes commented 3 years ago

hey @dprotaso well for me it seemed quite obvious, but i might be wrong here. image we have a web app, I, as a user, open it in a browser and hit versin 2 (new one), then I click a link and get to version 1 because of the randomness. then press 'back', but do not see the same page as version 1 is showing up again. etc etc

dprotaso commented 3 years ago

If you made small changes to your v1 app would it pushed as a v3?

I'm trying to determine the the lifecycle of the web apps.

des-esseintes commented 3 years ago

@dprotaso hmmmm i dont think i understandwhat you mean... my example was made up, kind of, but i do believe this feature would be handy in some cases. for example (a different one): https://cloud.google.com/appengine/docs/flexible/python/using-websockets-and-session-affinity#session_affinity

evankanderson commented 3 years ago

I've seen this request before and it's not unreasonable, but some of the outcomes are surprising no matter which design decision is chosen at this infrastructure level.

The general idea is that (for example) you might be serving JavaScript and HTML resource bundles that are matched, and you don't want to serve v237 javascript to a v238 client, or vice-versa.

One way to handle this would be to provide the application with a hint as to any tags that are mapped to that revision, and then allow the app to return information to the client to allow subsequent fetches to map to the same tag. This has the disadvantage that it's possible for actual traffic splits to move substantially from the requested amount, if clients hold onto old tags for an extended period of time. (The % splits will only apply to new requests for an application which is doing this.)
Another option would be to smuggle a "version selection hash" to the server (as a header which could be sent back to the client) which could be re-presented to ensure a consistent % hash allocation as the traffic percentage assignments change. The disadvantage here is that it's still possible to get the "broken bundle" problem when the traffic assignments change.
A third option is to use a consistent hash on request properties (client IP, possibly a known cookie set by the routing layer, or some other header) which is used by the routing layer to determine the % hash allocation (as in 2). The difference with option 2 is that this method is automatic, rather than requiring the server to opt in by repeating some value back to the client.

Unfortunately, there's a tension between stickiness and matching the requested traffic assignment percentages, particularly during a rollout. It would be worth comparing the options for real-world applications and making a recommendation using the Feature Tracks process (it can be a short doc, but I'd focus on the tradeoffs and why choosing a particular one is best).

One additional consideration is that doing more careful assignment of requests to particular buckets may be limited on some of the network routing backends, so it's probably worth talking to the networking WG about what functionality can be enabled.

evankanderson commented 3 years ago

/assign @nak3 @ZhiminXiang

rhuss commented 3 years ago

@dprotaso it's not really about state but that you alway hit the same version of your app once it has been selected with the first request. Think about canary releases where you want a user that hits the canary should stay with this version. Regardless of state, hitting two different versions of your (web) app during a user interaction is definitely not what you want.

rhuss commented 3 years ago

Of course the client needs to decide what the first request is, and then send back a revision tag that is honoured by the router to hit the same revision (not same pod of course) as long as the revision tag is present in the header. The revision tag is picked up from the response header ideally (imo). Also, the client decides when that "session" is over (e.g. when the user logs out).

Of course this requires active support by the app itself (i.e. picking up the revision tag from the http response), but this is an easy way how you can achieve "revision stickiness" (that is distributed over multiple pods maybe).

I understand that there is a conflict between matching the traffic split rules and client selected revision stickiness, but maybe the routing algorithm could take into account user pinned revision by counting those requests and distribute only 'fresh' requests according to the rules ? (saying that while not really knowing how the distribution works)

It's tricky indeed.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

rhuss commented 3 years ago

I think this issue is still important /remove-lifecycle stale

evankanderson commented 3 years ago

It sounded in https://github.com/knative/serving/issues/8160#issuecomment-664954239 like we might have a solution based on header-based tag routing:

Tag header-based routing could be half of the solution. If now an HTTP response from a revision would now include its revision-tag, then a client could perfectly do the correlation on its own:

Send first requests to the service URL without any extra info

Traffic split selects a revision by distribution

HTTP response from this revision contains the tag as specified in the traffic split

Client picks up that response header field and sends it back on all future requests.

That way as soon as a revision has been selected, the client is able to stick to this revision.

Is this a matter of documenting this pattern at this point?

/kind documentation /kind enhancement /triage accepted

/good-first-issue

knative-prow-robot commented 3 years ago

@evankanderson: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to [this](https://github.com/knative/serving/issues/8160): >It sounded in https://github.com/knative/serving/issues/8160#issuecomment-664954239 like we might have a solution based on header-based tag routing: > >> [Tag header-based routing](https://knative.dev/development/serving/samples/tag-header-based-routing/) could be half of the solution. If now an _HTTP response_ from a revision would now include its revision-tag, then a client could perfectly do the correlation on its own: >> >> * Send first requests to the service URL without any extra info >> * Traffic split selects a revision by distribution >> * HTTP response from this revision contains the tag as specified in the traffic split >> * Client picks up that response header field and sends it back on all future requests. >> >> That way as soon as a revision has been selected, the client is able to stick to this revision. > >Is this a matter of documenting this pattern at this point? > >/kind documentation >/kind enhancement >/triage accepted > >/good-first-issue Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

rhuss commented 3 years ago

Yep, I think having this documented and maybe having a sample that shows how a simple javascript based web-client could leverage this technique would be very helpful.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

rhuss commented 2 years ago

/remove-lifecycle stale

abrennan89 commented 2 years ago

@evankanderson @rhuss is documentation still required for this? Would it be worth opening an issue in the docs repo and closing / linking to this one? Any idea who could provide information for this to base docs on, or can we get a volunteer SME to work on doc drafts?

evankanderson commented 2 years ago

I think a blog post and maybe some documentation updates would be appropriate here; this is a a matter of using some of the existing features creatively and probably isn't obvious to most users.

I don't know if we have a category / queue for "technical blog post topics".

rhuss commented 2 years ago

+1 for a blog post at least, but that should include a simple example of such a client, like an expressjs app, that does this roundtrip. So it's more than just writing but also contains some coding ahead. Maybe @csantanapr could help us here ?

salaboy commented 2 years ago

@rhuss @evankanderson @dprotaso I find this issue quite interesting.. is there any documentation that I can look at where people documented some of the use cases that we want to target? I do see this as very relevant for functions as well.

mattmoor commented 2 years ago

We should dedup this with #9039 to consolidate.

rhuss commented 2 years ago

@salaboy I'm afraid we haven't really documented the use case but one is straightforward: If you are doing a canary deployment, where you route 95% of your users to the existing version and 5% to a new (canary) version to check out how the user reacts, then you want that during a user's session (that spans multiple subsequent HTTP requests) all HTTP requests hit the same version as it has been initially selected, especially if both versions are not compatible to each other. Actually "revision stickiness" is needed for all applications where multiple-requests are used for user interaction.

The question would be where we want to document how the tag header-based routing can be leveraged to achieve this revision stickiness.

dprotaso commented 1 year ago

/unassign @tcnghia @ZhiminXiang @nak3 /triage needs-user-input

Following up it would be good to get user input on:

can Tag Header Based Routing to address the 'hit the same version of my app' problem
how do people usually handle client (SPA)/server drift - ie. most things break but I could imagine a re-direct to trigger a page reload
Do you need to signal stickiness via some different mechanism (ie. cookie)

Given the concerns mentioned in https://github.com/knative/serving/issues/9039#issuecomment-677848875 I think I would want to keep things sane and just support stickiness to Revisions (and not instances/Pods)

tardieu commented 1 year ago

I have started working on a possible implementation of revision and session affinity for Knative Serving:

short presentation on this topic: https://docs.google.com/presentation/d/1eqxyuLlAO9fAl2WbzNFLTkQv8DKgCeOUL7bx2Hye8H8/edit?usp=sharing
github repo with motivation/description/demo: https://github.com/tardieu/affinity

This is very much a work in progress. Feedback welcome!

dprotaso commented 1 year ago

Hey @tardieu I posted some comments on the slides

tl;dr - you're proposal segways into Actors &Scalable stateful services. But I would consider that a non-goal for Knative.

Circling back, I think we should avoid pod/instance affinity as Matt has already pointed out problems with load balancing, autoscaler accounting etc. (https://github.com/knative/serving/issues/9039#issuecomment-677848875)

I think we should just stick to Revision stickiness for now.

tardieu commented 1 year ago

Thx for the comments. Revision affinity is the primary goal for this. I agree session affinity and container concurrency control do not play well together. Having said that there are cases where what we want is affinity and not container concurrency control. Knative already supports disabling concurrency control.

tardieu commented 1 year ago

I should explain stateful (and actors) better.Today Knative, even Knative Functions are stateful in the sense I can run a function for 15 minutes (borrowing the limit from Lamba) and have state persisted for that long (most likely...). What I am proposing is to make it possible to extend today's "pod is likely to survive 15 minute" feature from running single function invocation to running a series of actor method invocations. This makes it possible eg to add or refine the parameters of my invocation over time in particular based on intermediate results. I am not proposing to bring long-lived state to Knative at all.

slinkydeveloper commented 1 year ago

Hi everyone, I have a use case where I could really use this feature. I have a push based system which invokes knative services sending some state, and I would like to be able to implement some local cache for such state in every knative service. This state is inherently partitioned, meaning that I can have cache hits only if I hit the same containers.

To implement that, I had in mind something like cookie based traffic splitting, meaning I set some cookie, based on the state partition id, and the routing layer takes care of routing the request if possible to the same container. Bear in mind, the important bit for me here is if possible, because all I need to implement is a cache, so I'm ok if the routing layer implements stickyness in a best effort manner, perhaps giving up on that when concurrency control kicks in.

I talk about cookies here but any header will do fine, as long as i can put an arbitrary string there identifying the partition id of my state.

dprotaso commented 1 year ago

@slinkydeveloper why not just use a real cache/data store?

tardieu commented 1 year ago

@dprotaso a distributed in-memory key-value map for instance can help alleviate the latency of a remote store pull (eg from S3) but does not help with in-cluster network latency if the platform does not support affinity as the cached data may reside in another pod/node. Beyond the latency of replicating/migrating the cached data intra-cluster there is also a memory pressure/management issue.

tardieu commented 1 year ago

@slinkydeveloper so you are ok keeping track of a cookie client-side in this use case, something like:

send 1st request with Enable Affinity header
receive response including affinity cookie
client persists cookie
set header and send cookie on every subsequent request
update persisted cookie after every request (as the selected container may change due to weak affinity)

In such a scenario, the client is basically in charge of persisting knative's routing decision. This definitely facilitates the job for knative but has several drawbacks, some of them discussed in the documentation of Envoy's sticky sessions extension:

the client has to do more work (than eg just setting a constant header on each request, or alternatively setting some configuration params like annotations on a knative service)
knative has no good way of verifying that the request is legitimate (eg Envoy's session cookie can be forged)
caching does not work across clients (unless they exchange cookies...)

tardieu commented 1 year ago

@dprotaso I also replied to your comments on the deck inline: https://docs.google.com/presentation/d/1eqxyuLlAO9fAl2WbzNFLTkQv8DKgCeOUL7bx2Hye8H8/edit?usp=sharing

tardieu commented 1 year ago

Blog on sticky revisions: https://medium.com/@tardieu/sticky-revisions-for-knative-services-bf170709fd6a

evankanderson commented 1 year ago

Just bringing this back to the issue; there's been a bit of engagement with @steren in the comments on the document: https://docs.google.com/document/d/17WrnCoZgpsNTkXDl9A8u5FAMhOgd4JALV83OuQ07T80/edit#

In particular, Cloud Run is implementing this today (with somewhat different semantics) using a vendor-specific annotation. It would probably make sense to see whether there's enough similarity between this proposal and the Cloud Run implementation to have a common mechanism (and what the semantics of that mechanism are).

Steren, do you have any timelines you'd be willing to share?

steren commented 1 year ago

Cloud Run session affinity aiming at General Availability in Jan 2023.

Because session affinity is not part of Knative (I opened a proposal in Aug 2020), Cloud Run is today using a vendor extension (annotation prefixed by run.googleapis.com) .

Today, Cloud Run session affinity is in Preview and only an affinity to instances, Revision-level affinity is rolling out and is a GA blocker. See the thread @evankanderson linked to for a high level explanation of how Revision level affinity has been implemented. I am following up internally to see if more details could be shared, but since the implementation is different, my summary in the doc is probably enough.

If there is consensus on the attribute name to enable session affinity for Knative, we're open to delay the GA launch of the feature in Cloud Run so that it launches with the proper attribute for consistency (Worst case, it can be added later)

tardieu commented 1 year ago

@steren This is very useful feedback. Thanks!

best-effort session affinity:

I have no objection and in fact I would prefer implementing “best-effort session affinity” before revision affinity as I also consider the latter more complex.

Could you share some stats supporting the use case for best-effort session affinity? The main argument against the proposal so far has been that there is little demand hence value for it.

Do you see demand for and do you plan to support stronger forms of session affinity?

revision affinity:

I agree that the cloud run semantics combines traffic splitting and revision affinity in a compelling manner. I am curious about guarantees. When going from a 20/80/0 split to a 0/80/20 split for instance do you guarantee/how do you guarantee that the 80% shards to the middle revision are unchanged. Are you computing diffs (previous vs. current) or do you rely on a pure stateless solution (some form of consistent hashing)? When a new traffic split is enacted, do you guarantee there can be no back and forth during the transition (eg if the new configuration is asynchronously propagated)?

The current proposal is biased toward minimizing the size of the PR. In particular, it does not require modifications to the traffic splitting code but simply overrides it. If there is a consensus that a bigger change, a la cloud run, is worth the larger footprint, I’d be happy to revise the proposal accordingly.

Knative cloud run alignment:

For full interop, we should consider not only service/revision configurations but also cookies. Are the cloud run affinity cookies documented anywhere? Maybe you consider the content of the cookie private, maybe it is encrypted, but I would expect the name of the cookie to be stable and documented so clients may filter it appropriately.

steren commented 1 year ago

Could you share some stats supporting the use case for best-effort session affinity?

At the moment, Cloud Run only offers best effort instance-level session affinity (revision-level affinity is coming soon)

The feature has been enabled by 2,000 Cloud Run services. We have not heard negative feedback (apart from the request of revision-level affinity). For a Preview feature, this usage number is enough for us to validate that it can go GA.

The main argument against the proposal so far has been that there is little demand hence value for it.

The main driver for session affinity was to better support WebSockets and to maximize cache hits in case of in-memory caching.

quentin-cha commented 1 year ago

After some discussion with Steren, we decided to share a public version of our design doc for session affinity with traffic splitting support. Hopefully this answers some of the questions about session affinity, particularly around revision afffinity.

Objective

Allow session affinity to be compatible with traffic splitting.

Background

Cloud Run session affinity is a mechanism that allows a user to request a sequence of requests to be routed to the same container instance. Cloud Run also offers traffic splitting that allows users to specify which revisions should receive traffic and to specify traffic percentage that are received by a revision. When enabled at the same time, the ability to maintain session affinity is heavily impacted by the traffic splitting configuration. Given that traffic splitting performs random uniform assignment on the incoming requests, when traffic splitting is enabled alongside session affinity, it is possible that subsequent requests with session affinity cookies attached will be assigned to a different revision of the service, making it impossible to respect session affinity. This document proposes a design for traffic splitting to accommodate for session affinity while still allowing traffic migration to take place.

Design

The design proposed here is to create a request to shard assignment and have this assignment information stored in the session affinity cookie. This information, when attached to an incoming request, will be used for traffic splitting to make an informed decision to preserve this assignment, allowing session affinity to continue. As for traffic splitting, we will assign shards to revisions in accordance to the traffic splitting percentage instead of assigning requests to revisions directly.

Control Plane

When the traffic splitting configuration is created, a fixed number of shards are assigned to each revision according to the configuration. When the traffic splitting configuration is updated, the minimum number of shards are being reassigned to match the configuration in order to minimize traffic disruption.

Here is an example of the configuration where the max number of shards is 1000:

traffic split %	shard assignment
10% : 90%	revision 2: [0 - 100) revision 1: [100 - 1000) shards migrated: [0 - 100) traffic disruption: 10%
20% : 80%	revision 2: [0 - 100), [900 - 1000) revision 1: [100 - 900) shards migrated: [900 - 1000) traffic disruption: 10%
40% : 60%	revision 2: [0 - 100), [700 - 1000) revision 1: [100 - 700) shards migrated: [700 - 900) traffic disruption: 20%
75% : 25%	revision 2: [0 - 100), [350 - 1000) revision 1: [100 - 350) shards migrated: [350 - 700) traffic disruption: 35%
99% : 1%	revision 2: [0 - 100), [110 - 1000) revision 1: [100 - 110) shards migrated: [110 - 350) traffic disruption: 24%

Data Plane

When traffic splitting occurs at the data plane, the shard assignment of the request is stored as part of the request state, and when the session affinity cookie is generated for the response, this assignment is written into the cookie. Revision resolution is done by looking at the assigned shard for the request. When the next request with this cookie attached is received by data plane, traffic splitting will use this shard assignment instead of generating a new one.

All requests in a sequence will be assigned to the same shard and thus will be resolved to the same revision. The shards can move to different revisions as the user changes the traffic split proportion. This aligns more with typical traffic splitting use cases where the user tries to do blue-green deployment and gradually migrates traffic from old revision to new revision. In the case when shards migrate to a different revision, session affinity will not be maintained as requests will be served by a different instance serving a different revision. However after the migration has completed, session affinity will resume.

Caveats

Regardless which approach we take, we will likely impact the distribution of requests to the revisions. In an extreme example where a single client using session affinity is responsible for 90% of all requests, 90% of the actual traffic will be routed to a given revision regardless of the configured traffic splitting proportions.
Another caveat is that if traffic splitting is enabled between revisions where only some revisions have session affinity enabled. Since every request that doesn't have a session affinity cookie attached is subject to a random traffic splitting assignment, including new requests and requests that are not assigned to a revision that has session affinity enabled, those requests will eventually be assigned a shard that maps to revision with session affinity enabled and those requests will stay with that assignment. The result is that requests will gradually be shifted towards revisions with session affinity enabled without the user changing the traffic splitting distribution.

Alternatives Considered

The request would be routed to the original revision regardless of the traffic split configuration. This might not be what the user wants because even when the user configures 100% of the traffic to go to the new revision, session affinity requests will continue to go to the original revision. Taking this approach will likely give users the impression that traffic splitting is not working as intended.

tardieu commented 1 year ago

Recovering from covid.... I have not been able to make progress on this lately.

Any thoughts about attribute names? I don't have a strong opinion about this. The current implementation uses annotations of the form activator.knative.dev/sticky-*. But this is just temporary.

mbaynton commented 10 months ago

What's the status of this document? https://docs.google.com/document/d/17WrnCoZgpsNTkXDl9A8u5FAMhOgd4JALV83OuQ07T80/edit

Is it far enough along that it makes sense to work on implementation?

dprotaso commented 3 months ago

https://docs.google.com/document/d/17WrnCoZgpsNTkXDl9A8u5FAMhOgd4JALV83OuQ07T80/edit It's abandoned - I'll update the doc

Related: Gateway API is definiting session persistent now in this GEP - would be good to see if the use cases above work with the the proposed implementation here https://github.com/kubernetes-sigs/gateway-api/pull/2634

mcandeia commented 2 months ago

@des-esseintes currently Knative does not support session affinity / sticky sessions/ cookie based routing. To workaround this, you could try to add your own resources (e.g. VirtualService for Istio) to route traffic to revision services based on your requirements.

I'm not sure if my usecase is the one that you describe, but I just to want to use Istio destination rule to sticky to an specific pod based on a cookie value. It's not about revision stickiness but pod instead using load balancer settings.

similar to this:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: previews-stickiness
  namespace: sites-play
spec:
  host: previews.deco.site
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpCookie:
          name: deco_env
          ttl: 0s

I already have a DomainMapping previews.deco.site I just want stickiness by the given cookie value. Is there a possibility? I tried using several different approaches but I couldn't manage to have it working.

maragunde93 commented 1 month ago

@des-esseintes currently Knative does not support session affinity / sticky sessions/ cookie based routing. To workaround this, you could try to add your own resources (e.g. VirtualService for Istio) to route traffic to revision services based on your requirements.

I'm not sure if my usecase is the one that you describe, but I just to want to use Istio destination rule to sticky to an specific pod based on a cookie value. It's not about revision stickiness but pod instead using load balancer settings.

similar to this:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: previews-stickiness
  namespace: sites-play
spec:
  host: previews.deco.site
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpCookie:
          name: deco_env
          ttl: 0s
I already have a DomainMapping previews.deco.site I just want stickiness by the given cookie value. Is there a possibility? I tried using several different approaches but I couldn't manage to have it working.

Hey folks, I don't think Sticky sessions is possible even using Istio destination rules

I have also been trying to make this work and I couldn't with the ksvc but it worked perfeclty with a Deployment. Today I think I found out why this happens.

If you check this you will see the request flow for knative.

As you can see here:

The VirtualServie of Istio will point to the Public Service that knative creates, this serive endpoints are the activator PODs and not the actual application pods (since those scale to 0), even when the App pods are running the public service keeps pointing to the activator as far as I have seen.

So when you set the destination rule to be sticky to a pod, what is happening is that it makes the request sticky to 1 of the IPs in the Knative Public service, so it makes is sticky with 1 of the Activator pods instead of the application pod.

I don't think there is a work around for this, if you point the VirtualService and DestinationRule to the knative private service the app won't scale.

My recommendation is to use a normal Deployment if you need to have sticky sessions.

In case you want to verify this, get the Services endpoints with

kubectl get endpoints -n <app-namespace> -oyaml

knative / serving

Session affinity / sticky sessions / cookie based traffic splitting #8160

Ask your question here:

Objective

Background

Design

Control Plane

Data Plane

Caveats

Alternatives Considered