kmesh-net / kmesh

High Performance ServiceMesh Data Plane Based on Programmable Kernel
https://kmesh.net
Apache License 2.0
445 stars 63 forks source link

RDS Not Updating after kmesh long periods of Inactivity #964

Open PerforMance308 opened 2 days ago

PerforMance308 commented 2 days ago

What happened:

After running KMesh for an long period without any operations, RDS stops updating. The following observations were made:

  1. KMesh is left running overnight without any operations.
  2. The next day, modifications to the existing VirtualService do not take effect. For example, changing the match prefix in a VirtualService from/test to /test-echo does not update.
  3. Restarting KMesh resolves the issue temporarily.
  4. Debugging logs indicate that only CDS and EDS updates are happening, while RDS is not updating.

What you expected to happen:

Modifications to VirtualService should be applied without needing to restart KMesh.

How to reproduce it (as minimally and precisely as possible):

  1. Start KMesh and configure a VirtualService with match prefix /test.
  2. Allow KMesh to run overnight without any operations.
  3. Modify the VirtualService to match prefix /test-echo.
  4. Observe that the update does not take effect.

Anything else we need to know?:

Environment:

hzxuzhonghu commented 2 days ago

Can you enable kmesh debug log level and paste the log here?

PerforMance308 commented 1 day ago

I manually added logs to every handlexdsResponse function :

image

image

And same for handleEdsResponseand handleLdsResponsefunction,

from the log we can see that only CDS and EDS were printed.

But after restarting kmesh, all xDS could be updated

hzxuzhonghu commented 1 day ago

What istiod version? And @lec-bit and I fixed a similar bug in v0.5 https://github.com/kmesh-net/kmesh/pull/890

PerforMance308 commented 22 hours ago

istio v1.19

PerforMance308 commented 15 hours ago

log.txt

After line 283 of this log file, I performed an update operation on the VirtualService, only modifying the match prefix. The content printed shows that no RDS update was received.

hzxuzhonghu commented 10 hours ago

I suspect this is due to this part of code directly.

https://github.com/kmesh-net/kmesh/blob/e5b802841dfe2e1c5c755ad9b19f0fff6f35dd23/pkg/controller/ads/ads_processor.go#L283-L287

We do not cleanup p.Cache.routeNames even when xds connection reconnected, so after reconnect, this route name check maybe equal(because only vs match prefix updated here.)

Actually on istiod side, a new xds connection will not share any info with previous connection, so it has no info about what route names the client subscribed, So it will have no route at all, then no need to push