Closed dee0sap closed 1 week ago
Was reviewing doe to try and understand where the 'allowWatchBookMarks' calls would have come from. Probably they come from one of my calls to
There is only one call to the first while there are 2 or 3 calls to the second.
The call to UntilWithSync is in the code that waits for container status to show up.
Commented out the code calling UntilWithSync. Before there were always two error messages logged back to back. Afterwards just one. Still the sync didn't happen like I expected.
Fwiw, confirmed no problem when not using vcluster.
@dee0sap sorry for the late reply and thanks for creating this issue! It seems our logic seems a little outdated there and I'll try to reproduce this and fix it
I think the problem was that when adding 2 ephemeral containers at once or too quickly it will get into issues, I will create a fix for this and would be great if you could test that then with the latest v0.21.0-beta
Thanks @FabianKramm
I have doubts that the problem is with the 2nd ephemeral container being added too quickly, but maybe it could be due to the 1st one being added too quickly after pod creation.
I say this because one of the things I did when trying to debug was to set a breakpoint in my code just before the point where it would add the second ephemeral contain and then, when the breakpoint was hit, I waited 1 minute, as measured by the timer on my cell phone, before allowing the execution to continue. This didn't make a difference. ( I mentioned the original description )
However... I didn't do anything similar between the point where the pod is initially created and the first ephemeral container is created. And, that would be a significant difference between what is happening in the testing of my code and that shell script I shared in the description, that is the shell script that did not replicate the problem.
So I'll find time today to do another test, one were I insert a long pause between the initial creation of the pod and the addition of the first ephemeral container and I'll share whatever the results are here.
And of course when you have a candidate fix ready I'll give it a shot as well.
Hey @FabianKramm
So for testing purposes, I updated my code, and test code, so that
Same result.... in vcluster both ephemeral containers get added but on the host cluster only the first is added. I don't notice any problem with the first ephemeral container, it seems to run fine on the host.
@dee0sap thanks for the additional information, we released v0.21.0-beta.2 that should include a fix for this, would you mind testing that version and see if it solved the problem?
Hey @FabianKramm Seems fixed in version 0.21.0-beta.2. However I did run into a couple of problems moving from 0.19.6 to the new version
Init manifests aren't applied before any pods are created. I need to make sure the default PriorityClass that is defined on the host is copied into the vcluster before any pods are created. With 0.19.6 I was able to do this .init.manifests. I am trying to do the same thing with the new version but in .experimental.deploy.vcluster.manifests. While the PriorityClass is copied over it isn't before the coredns is created. To get around this I am deleting the coredns pod in the vcluster right after vcluster creation and before doing anything else.
It seems vcluster is no longer able to read values through a symlink. My shell script for creating the vcluster invoked vcluster create like
vcluster create ... -f <(cat <<EOF...
The process substitution I was using yields a symlink. After switching to the new version vcluster acted like I hadn't passed a values file at all. I worked around this by first saving the config I am generating to a normal file and then passing the name of that file to vcluster.
Please let me know if I should open up issues for either of these items.
@dee0sap thanks for the feedback and sorry for the late reply, I was on vacation. I can fix the first issue you mentioned, but for the second would be great if you create a separate issue for this. Also glad to hear the problem is fixed!
What happened?
( From https://loft-sh.slack.com/archives/C01N273CF4P/p1727892709938959 )
I have a scenario where a 2nd patch, which adds a second ephemeral container, of a pod doesn't get synched to the host. However this doesn't always happen. For example, the following shell script is not a problem
However in the kubectl plugin I am working on the story is different. Some history on this effort of mine...
The plugin extends the kubectl debug command. Given the right set of flags it creates a sidecar ephemeral container in addition to the ephemeral container that kubectl debug code would normally create.
The normal flow of kubectl debug is something like:
When I started working on the plugin I was creating the sidecar ephemeral container in the overridable attach function.
Basically I was re-running the 'kubectl debug' code so that the sidecar would get created.
I initially had a problem where the sidecar would get added in vcluster but never show up in the host ( same symptom I have now ). I found that by updating my override to the attach code so that before attempting to create the sidecar it would wait for a container status to be visible for the debug container then all was good.
Because of a change in requirements, I had to move the creation of the sidecar into step #3. This means that now the sidecar ephemeral container is getting created prior to the debug container. Because of my previous experience I put in the wait for the sidecar container status to show up prior to moving onto the creation of the debug container.
However unlike before, where the second ephemeral container would show up on the host as long as I waited for some status to be visible on the first, the second ephemeral container, that is the debug container now, never shows up on the host.
Some things I tried, without luck, to get around this are :
While using 0.20.1 and replicating the failure case, I observed the following
Before the 1st patch the most recent log message was
2024-10-03 23:35:13 INFO commandwriter/commandwriter.go:126 watch chan error: etcdserver: mvcc: required revision has been compacted {"component": "vcluster", "component": "apiserver", "location": "watcher.go:338"}
The resourceVersions were - host: 2677250701, vcluster: 468
After the first patch the most recent log messages were :
The resourceVersions were - host: 2677260040, vcluster: 496
The expected ephemeral container was visible both on the host and on the vcluster
Right before the 2nd patch there were these logs messages :
The resource versions were unchanged from before
After the 2nd patch there were these log messages :
Resource version on host unchanged Resource version on vcluster 524
NOTE: The error log messages did not happen when the shell script ran.
Some minutes after the 2nd patch I used 'kubectl edit' to add an annotation to the target pod in the vcluster. The annotation was synced to the host cluster. However the list of ephemeral containers remained out of sync, there were two of them on the vcluster but only 1 on the host.
What did you expect to happen?
I expected the 2nd ephemeral container to appear on the pod in the host cluster.
How can we reproduce it (as minimally and precisely as possible)?
Not sure. Obviously using 'kubectl debug' doesn't reproduce the situation. With instruction, perhaps I can collect more logs which and those would reveal a sequence of API calls that would reproduce the problem.
Anything else we need to know?
Probably ;)
Host cluster Kubernetes version
vcluster version
VCluster Config