microsoft / mindaro

Bridge to Kubernetes - for Visual Studio and Visual Studio Code
MIT License
307 stars 106 forks source link

Challenges with Bridge to Kubernetes #63

Open tvvignesh opened 4 years ago

tvvignesh commented 4 years ago

Hi. I wanted to try Bridge to Kubernetes and I am really interested in it since it solves some of my major challenges but I am facing a few issues. Thought of consolidating them to one issue here. Requesting your prioritization.

  1. Unable to use with Remote SSH

If I try to install in Remote SSH (Ubuntu 20.04 remote), the remote ssh connection disconnects again and again repeatedly. It works fine if I uninstall/disable this extension.

As a temporary workaround, I tried to install VSCode remotely, used RDP to get in and tried it out.

  1. Does not work when the kubeconfig has a proxy-url

kubectl v1.19.x introduced a proxy-url flag (https://github.com/kubernetes/client-go/issues/351) which is important if you connect to the K8 cluster via a bastion host. I have no problems using kubectl, helm canary and other tools with this flag but this extension does not work (I guess you are using an older version of the kubectl binary) and is not able to discover the services in my cluster since it does not use the flag and times out. And I am not sure how I would otherwise specify a proxy url for my cluster except for starting VSCode with the HTTPS_PROXY env vars. I guess, just upgrading the kubectl binary to the latest would make it work from your end.

  1. Using with service mesh setup

I use Linkerd as the service mesh and have sidecars in all my containers. Supporting sidecars would be really important so that I need not have a separate config for development and a separate config for production.

  1. Routing manager does not run as non-root or does not set security context

I have restricted privileges set in my cluster (same as this PSP: https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/policy/restricted-psp.yaml) and when I run bridge to kubernetes, the routing manager deployment fails with this error:

4

This is again very critical to maintain security while allowing bridge for kubernetes in development

And I get these error logs when connecting with the extension:

2020-10-21T18:21:45.0224329Z | MindaroCli | TRACE | Starting EndpointManager...
2020-10-21T18:21:45.0834135Z | MindaroCli | TRACE | Waiting for EndpointManager to come up ...\n
2020-10-21T18:21:52.6449050Z | MindaroCli | TRACE | EndpointManager came up successfully.\n
2020-10-21T18:21:55.9776382Z | MindaroCli | ERROR | Dependency: Service Run - Port Forward <json>{"target":null,"success":false,"duration":null,"properties":{"requestId":"null","clientRequestId":"null","correlationRequestId":"null"}}</json>
2020-10-21T18:21:55.9785155Z | MindaroCli | ERROR | ServiceConnectCommand.ExecuteInnerAsync caught exception System.NullReferenceException: Object reference not set to an instance of an object.\n   at Microsoft.DevSpaces.Library.Connect.KubernetesRemoteEnvironmentManager.<>c.<_GetPodAndContainerFromServiceAsync>b__38_7(V1ContainerPort p)\n   at System.Linq.Enumerable.SelectListIterator`2.ToList()\n   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)\n   at Microsoft.DevSpaces.Library.Connect.KubernetesRemoteEnvironmentManager._GetPodAndContainerFromServiceAsync(String namespaceName, String serviceName, String containerName, CancellationToken cancellationToken)\n   at Microsoft.DevSpaces.Library.Connect.KubernetesRemoteEnvironmentManager.ResolveConnectionDetailsAndSaveContextAsync(RemoteContainerConnectionDetails remoteContainerConnectionDetails, CancellationToken cancellationToken)\n   at Microsoft.DevSpaces.Library.ManagementClients.ConnectManagementClient.<>c__DisplayClass18_0.<<StartRemoteAgentAsync>b__0>d.MoveNext()\n--- End of stack trace from previous location where exception was thrown ---\n   at Microsoft.DevSpaces.Library.ManagementClients.DevSpacesManagementClientExceptionStrategy.RunWithHandlingAsync[T](Func`1 func, FailureConfig failureConfig)\n   at Microsoft.DevSpaces.Library.ManagementClients.ConnectManagementClient.StartRemoteAgentAsync(IProgress`1 progress, CancellationToken cancellationToken)\n   at Microsoft.DevSpaces.Exe.Commands.Connect.ConnectCommand.ExecuteInnerAsync(IConnectManagementClient connectManagementClient, Action`1 workloadStartedHandler, CancellationToken cancellationToken, IRoutingManagementClient routingManagementClient)
2020-10-21T18:21:55.9790019Z | MindaroCli | ERROR | Connect operation failed.\n
2020-10-21T18:21:55.9795836Z | MindaroCli | TRACE | Stopping workload and cleaning up...\n
2020-10-21T18:21:55.9932861Z | MindaroCli | ERROR | Dependency: Service Run - Port Forward <json>{"target":null,"success":false,"duration":null,"properties":{"requestId":"null","clientRequestId":"null","correlationRequestId":"null"}}</json>
2020-10-21T18:21:55.9947392Z | MindaroCli | ERROR | Oops... An unexpected error has occurred.\n
2020-10-21T18:21:55.9952190Z | MindaroCli | ERROR | For diagnostic information, see logs at '/tmp/Bridge To Kubernetes'.\n
2020-10-21T18:21:55.9982099Z | MindaroCli | ERROR | Logging handled exception: System.NullReferenceException: {"ClassName":"System.NullReferenceException","Message":"Object reference not set to an instance of an object.","Data":null,"InnerException":null,"HelpURL":null,"StackTraceString":"   at Microsoft.DevSpaces.Library.Connect.KubernetesRemoteEnvironmentManager.<>c.<_GetPodAndContainerFromServiceAsync>b__38_7(V1ContainerPort p)\n   at System.Linq.Enumerable.SelectListIterator`2.ToList()\n   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)\n   at Microsoft.DevSpaces.Library.Connect.KubernetesRemoteEnvironmentManager._GetPodAndContainerFromServiceAsync(String namespaceName, String serviceName, String containerName, CancellationToken cancellationToken)\n   at Microsoft.DevSpaces.Library.Connect.KubernetesRemoteEnvironmentManager.ResolveConnectionDetailsAndSaveContextAsync(RemoteContainerConnectionDetails remoteContainerConnectionDetails, CancellationToken cancellationToken)\n   at Microsoft.DevSpaces.Library.ManagementClients.ConnectManagementClient.<>c__DisplayClass18_0.<<StartRemoteAgentAsync>b__0>d.MoveNext()\n--- End of stack trace from previous location where exception was thrown ---\n   at Microsoft.DevSpaces.Library.ManagementClients.DevSpacesManagementClientExceptionStrategy.RunWithHandlingAsync[T](Func`1 func, FailureConfig failureConfig)\n   at Microsoft.DevSpaces.Library.ManagementClients.ConnectManagementClient.StartRemoteAgentAsync(IProgress`1 progress, CancellationToken cancellationToken)\n   at Microsoft.DevSpaces.Exe.Commands.Connect.ConnectCommand.ExecuteInnerAsync(IConnectManagementClient connectManagementClient, Action`1 workloadStartedHandler, CancellationToken cancellationToken, IRoutingManagementClient routingManagementClient)\n   at Microsoft.DevSpaces.Exe.Commands.Connect.ConnectCommand.ExecuteInnerAsync(IConnectManagementClient connectManagementClient, Action`1 workloadStartedHandler, CancellationToken cancellationToken, IRoutingManagementClient routingManagementClient)\n   at Microsoft.DevSpaces.Exe.Commands.Connect.ConnectCommand.ExecuteAsync()\n   at Microsoft.DevSpaces.Exe.DevSpacesCliApp.RunCommandAsync(String[] args, CancellationToken cancellationToken)\n   at Microsoft.DevSpaces.Exe.DevSpacesCliApp.ExecuteAsync(String[] args, CancellationToken cancellationToken)","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2147467261,"Source":"Microsoft.DevSpaces.Library","WatsonBuckets":null}
2020-10-21T18:21:55.9992596Z | MindaroCli | TRACE | Event: Command.End <json>{"properties":{"arguments":"connect --service tc-svc-account-svc --env /tmp/tmp-3175603yadn4ifjl9h9.env --script /tmp/tmp-3175603yadn4ifjl9h9.env.cmd --control-port 51792 --ppid 3175560 --elevation-requests [{\"requesttype\":\"edithostsfile\"}] --routing vignesh-1740 --local-port 3050","result":"Failed"},"metrics":{"duration":11888.0}}</json>
  1. Very high resource usage

I am not sure why, but the moment I install Bridge for Kubernetes extension, I see my CPU and memory spiking up, a lot of processes running in the task manager (I am not sure if this extension is causing it, but I also see a lot of command prompts getting opened up) A lot of ssh processes are being left orphaned when this extension is installed.

With extension:

2

Without extension:

3

Environment details

Host: Windows 10 Home Insiders Remote: Ubuntu 20.04

Trying with GKE private cluster via Bastion

VSCode (Windows):

Version: 1.50.1 (user setup)
Commit: d2e414d9e4239a252d1ab117bd7067f125afd80a
Date: 2020-10-13T15:06:15.712Z
Electron: 9.2.1
Chrome: 83.0.4103.122
Node.js: 12.14.1
V8: 8.3.110.13-electron.0
OS: Windows_NT x64 10.0.20201

VSCode (Linux):

Version: 1.50.1
Commit: d2e414d9e4239a252d1ab117bd7067f125afd80a
Date: 2020-10-13T14:44:48.716Z
Electron: 9.2.1
Chrome: 83.0.4103.122
Node.js: 12.14.1
V8: 8.3.110.13-electron.0
OS: Linux x64 5.4.0-1025-gcp

Kindly let me know if you need any other info.

CC: @rakeshvanga @greenie-msft

aslatter commented 4 years ago

If I try to install in Remote SSH (Ubuntu 20.04 remote), the remote ssh connection disconnects again and again repeatedly. It works fine if I uninstall/disable this extension.

I'm able to reproduce this on Windows 10 using "Remote SSH" to connect to an Ubuntu VM.

After installing the "Bridge To Kubernetes" VS Code extension everything was fine for a while, but as soon as the "Updating dependencies" status-message hit 100% VS Code Remoting disconnected and then was only ever to remain connected for short periods before disconnecting and reconnecting again.

daniv-msft commented 4 years ago

Thank you @tvvignesh for this very complete status! We really appreciate you taking the time to report all this.

  1. Unable to use with Remote SSH

We know Remote SSH is important for multiple users, and supporting this is high in our backlog. Unfortunately I don't have a good workaround to offer (apart from using a full Windows/Mac/Linux machine), but this is definitely an important feature for us. Thank you @aslatter for reporting this as well!

  1. Does not work when the kubeconfig has a proxy-url

I'm logging a bug on our side to investigate this. Our code relies both on kubectl and the C# KubernetesClient, and the proxy-url flag might not be supported there yet.

  1. Using with service mesh setup

We know this one is important, and we expect that we might work with some service mesh/sidecars, but not all of them. This is also something high in our backlog (at least to have a clear status to communicate regarding with implementations we support).

  1. Routing manager does not run as non-root or does not set security context

@pragyamehta Looking at our deployment spec for the routing manager, I can't find us stating anywhere the permission we want to use. I imagine that by default, it will run as root which is why the security context mentioned above blocks us. Would you see any issues with us specifying a lower set of permissions for the routing manager?

@tvvignesh Regarding the null reference you mention on _GetPodAndContainerFromServiceAsync: this should now be fixed if you upgrade to the latest VS Code version.

  1. Very high resource usage

I'm logging a bug for this one as well, as I wasn't aware of it. Thank you for reporting it!

tvvignesh commented 4 years ago

Looking forward to it. Once all these challenges are solved, I can move my workflow completely to this extension on a remote cluster and that would be awesome.

tvvignesh commented 3 years ago

@daniv-msft Tried installing again to see if there were any updates in remote ssh. And I got this error when using the Commands in VSCode. Just adding it to this thread.

e1
daniv-msft commented 3 years ago

Thank you @tvvignesh for reporting this issue! This seems to me like an independent issue, where for some reasons the Bridge to Kubernetes extension wouldn't activate, and thus we wouldn't register the commands you see here. Another possibility is that we did activate, but that we had a crash at the beginning of our activation logic and didn't arrive at the time where we register our commands.

Could you please answer these questions to help us investigate?

Regarding Remote SSH, we started to work on it. The issue we encounter presently is that Bridge to Kubernetes requires elevated permissions and that we need them on the remote component, not on the local component where the UI is. We are investigating ways to get rid of the elevated permissions requirements, or potentially other way to requests them in the Remote SSH scenarios.

tvvignesh commented 3 years ago

Hi. I have it installed in Remote SSH since I wanted to test (not sure if that would be the issue here) - may be since you are still working on it.

Does this issue reproduce consistently when closing/reopening VS Code?

Yes it does every time I select Configure or Open Menu option.

In VS Code, when you reproduce the issue, could you please click Help > Toggle Developer Tools, select the Console tab and look for any errors logged by VS Code when trying to activate our extension?

These are the errors logged when selecting the options (Dont see any error after install):

e1

In VS Code, open the Output panel. Do you see the Bridge to Kubernetes logs? If not, then we likely didn't even start activation at all.

I don't see Bridge to Kubernetes here (I remember it used to show up before in older versions)

Regarding Remote SSH, we started to work on it

Sure. Great to hear that.

daniv-msft commented 3 years ago

Thank you for your quick reply! We're trying to reproduce this issue on our side with Remote SSH and will keep you updated.

daniv-msft commented 3 years ago

@tvvignesh We tried to reproduce this on our side, but unfortunately couldn't reproduce the issue. :( A good suggestion from @rakeshvanga: is it possible that the Bridge to Kubernetes extension is installed only locally, and not in the WSL section? That could explain why the extension seems installed but isn't activated.

image

tvvignesh commented 3 years ago

@daniv-msft I don't have WSL enabled/installed in my machine since I don't use it. Hence I don't even have the WSL section in VSCode. The extension is installed remotely in SSH as you see from the image below.

e1 e1

Anything else you would like me to look?

tvvignesh commented 3 years ago

Also, while not related to this, had one more suggestion. Bridge to Kubernetes service depends on Azure Kubernetes Service extension and cannot be uninstalled. But for users who are not on Azure (like me), this dependency seems unnecessary. Though not a priority, decoupling it would be nice like what Azure Kubernetes tools does. It installs providers only on demand.

rakeshvanga commented 3 years ago

@tvvignesh I've tried to repro this scenario with a remote ssh Linux machine and was not able to repro the command not found issue. Can you try a couple of troubleshooting things to understand what is going on: a. Remove the extension and install again

b. If you still face the issue, can you please try to run Kubernetes extension commands like Kubernetes: Show Cluster Info to make sure that this extension has been activated and not blocking Bridge to Kubernetes extension.

daniv-msft commented 3 years ago

@tvvignesh Also, regarding the AKS dependency: thank you for reporting this. This is a legacy dependency we had when we were AKS-only. This will be removed the next time we release, as we're now open to all Kubernetes clusters.

tvvignesh commented 3 years ago

@rakeshvanga @daniv-msft Removed all the deps as you said as below, reloaded VSCode (I use insiders so the extensions were in .vscode-server-insiders/ for me), Installed bridge to kubernetes again and all extensions got installed (attached the image below), Did show cluster info and the info came up (testing it locally with Kind cluster right now)

rm -rf mindaro-dev.file-downloader-1.0.6/ mindaro.mindaro-1.0.120201118/ ms-kubernetes-tools.vscode-aks-tools-0.0.8/ ms-kubernetes-tools.vscode-kubernetes-tools-1.2.1/ ms-vscode.azure-account-0.9.4/

e1 e1

And still I have the same error:

e1
tvvignesh commented 3 years ago

@rakeshvanga @daniv-msft Tested again, but now with VSCode instead of Insiders in Remote SSH as before. Now, the menu works, but facing the same problem as I mentioned initially in the issue

If I try to install in Remote SSH (Ubuntu 20.04 remote), the remote ssh connection disconnects again and again repeatedly. It works fine if I uninstall/disable this extension.

Looks like things break with Insiders.

rakeshvanga commented 3 years ago

@tvvignesh Thanks for trying the troubleshooting steps. If you encounter it in VS Code-insiders, I believe the issue might be on VS Code side but I can try to see if the next version of VS Code in any way trying to break us.

Regarding, ssh connection disconnecting repeatedly, I've seen this issue but once I upgraded my node to a vCPUs: 4 and RAM: 16 GiB machine I was not seeing the issue anymore. Is there a way for you to update the size of the remote ssh machine?

Meanwhile, I can investigate the issue for the remote ssh being disconnected after enabling Bridge To Kubernetes extension.

tvvignesh commented 3 years ago

@rakeshvanga Actually, my remote VM is even more bigger 32 GB RAM, 8 core

e1

Regarding not working in Insiders, are you by any chance using an absolute path to ~/.vscode-server? I remember it worked in insiders in one of the older versions though.

tvvignesh commented 3 years ago

Btw, just in case you want to try, this is my VSCode insiders version (latest):

Version: 1.52.0-insider (user setup)
Commit: 5e350b1b79675cecdff224eb00f7bf62ae8789fc
Date: 2020-12-04T10:15:27.849Z
Electron: 9.3.5
Chrome: 83.0.4103.122
Node.js: 12.14.1
V8: 8.3.110.13-electron.0
OS: Windows_NT x64 10.0.20201
rakeshvanga commented 3 years ago

@tvvignesh I've tried on the same version of VS Code insiders and was able to repro. It seems the error is because of the dependency we had on aks kubernetes extension. We've now removed that dependency and the next version of the Bridge To Kubernetes extension will not have this issue. Thanks for reporting the issue. image

tvvignesh commented 3 years ago

@rakeshvanga Awesome to hear that. Will try it out when it comes. Btw, the SSH issue is not sorted out yet right?

rakeshvanga commented 3 years ago

@tvvignesh, It is still not yet sorted out and we are actively working on it.

rakeshvanga commented 3 years ago

@tvvignesh The actual issue for BridgeToKubernetes not working in the VSCode-insiders 1.52 has been fixed by this issue: https://github.com/microsoft/vscode/issues/111913 by VSCode itself. Please download today's version of insider build and you shouldn't repro the issue anymore. Also, it was not specific to remote-ssh but the activation failed for other scenarios as well.

tvvignesh commented 3 years ago

@rakeshvanga Interesting. Just tried out after updating. You are right. The issue with menu no longer occurs in insiders and now have the consistent experience in Insiders and VSCode where both remote ssh diconnects repeatedly. Uninstalled bridge for kubernetes for now.

tvvignesh commented 3 years ago

Interestingly enough, bridge for kubernetes makes my CPU to run at 90% because of issue 5 (might be related to issue 1) and I can hear my laptop fans working at full speed 😂

I uninstalled Bridge for Kubernetes and did remote SSH and the fans are back to normal 🤣

rakeshvanga commented 3 years ago

@tvvignesh Thanks for trying it out on the new VS Code Insiders build and verifying the fix. Yes, we are tracking this issue and investigating why there is a high CPU usage while using the extension. I will update the thread once the fix has been released for this issue.

rakeshvanga commented 3 years ago

@tvvignesh We have released a feature 'Use Kubernetes service environment variables' to enable debugging on Remote SSH machines and on WSL. Can you give this a try and let me know if you still encounter issues?

Thanks.

tvvignesh commented 3 years ago

@rakeshvanga Sure. Will try it out this weekend and get back to you with updates on Monday 👍

tvvignesh commented 3 years ago

Oops. Forgot about this. Will update you tomorrow.

tvvignesh commented 3 years ago

@rakeshvanga Sorry for the very late update. I had tested it out just today. It did not work as per my expectations but many of the problems above like SSH connection getting disconnected, high memory usage when using Bridge for Kubernetes, etc. were solved now.

Since I use bastions, tunnels and remote SSH, my workflow might be quite different, but nevertheless, I just added a few screenshots to this doc: https://docs.google.com/document/d/1mO1EXW8s2ciRvQUzj5qdghK_bG2BwQ7ZjSlLW9cs8Fo/edit?usp=sharing

rakeshvanga commented 3 years ago

@tvvignesh Sure no problem. Thanks for trying out the feature and providing the feedback.

Miles-Garnsey commented 3 years ago

Has anyone addressed the problems with needing root in the containers? It was raised nearly a year ago and I can't see any answers on this.

It is interfering with the securityContext in operation on my cluster.

antonwnk commented 2 years ago

I have to start by extending my thanks for this great project!

Writing on this old issue because (after not the most thorough of searches) I could not find a better place where this is explicitly referenced:

Regarding Remote SSH, we started to work on it. The issue we encounter presently is that Bridge to Kubernetes requires elevated permissions and that we need them on the remote component, not on the local component where the UI is. We are investigating ways to get rid of the elevated permissions requirements, or potentially other way to requests them in the Remote SSH scenarios.

@daniv-msft Is there any place I could track the status of this? It seems that getting elevated permissions is still not a thing with Remote-SSH VSCode -- are there maybe any alternatives in the works? The CLI version should also solve this, right?

As I understand, what is currently supported regarding using Bridge on Remote-SSH is to tell it to use the service IP's from the pod's environment but also modifying your own code to use them, which is less than ideal.

Thanks!

rakeshvanga commented 2 years ago

@antonwnk Thanks for using Bridge To Kubernetes. For using Bridge to Kubernetes on remote-ssh did you see this comment: https://github.com/microsoft/mindaro/issues/63#issuecomment-791099790 from above?

It requires a small code change in your sources but all the functionalities should work in remote-ssh scenarios. Can you try and let me know if you find any issues.

Thanks!

antonwnk commented 2 years ago

Hey @rakeshvanga , thanks for taking the time. Indeed it works with the way you show in the comment above. I agree, the code changes are not major, but I am trying to make it work without changing the sources at all. Any ideas?

I tried running the bridge CLI executable with sudo but it won't finish connecting, with these logs, bridge-library-logs.txt

Alternatively, I'm thinking of maybe running bridge and my application the code inside a Docker container and setting up a reverse proxy automatically to map the environment variables back to their Kubernetes FQDN's.

rakeshvanga commented 2 years ago

Hi @antonwnk, I went over the logs and the errors are expected since there is no UI, bridge cannot move forward. Even when you try with container, bridge would work using the environment variables scenario. There is no way to open up a UI when running with containers.