Bridge to kubernetes fails with error "Failed to get routing manager deployment status" when using isolation mode.

microsoft / mindaro

Bridge to Kubernetes - for Visual Studio and Visual Studio Code

MIT License

307 stars 106 forks source link

Bridge to kubernetes fails with error "Failed to get routing manager deployment status" when using isolation mode. #136

Closed l0stmylife closed 3 years ago

l0stmylife commented 3 years ago

Describe the bug Bridge to kubernetes fails with error "Failed to get routing manager deployment status" when using isolation mode.

To Reproduce 1) Edit profile for Bridge to Kubernetes - check "Enable routing isolation" 2) Edit csproj USER File - set correct RoutingHeader and LaunchUrl 3) Edit profile again to ensure correct settings have been locked in 4) Run the microservice with LaunchProfile Bridge to Kubernetes 5) receive error after couple of minutes

Expected behavior Bridge forwards traffic to our microservice

Logs Attached logs from the following directories: %TEMP%/Bridge to Kubernetes Temp.zip %temp%\Microsoft.VisualStudio.Kubernetes.Debugging routing manager logs routingmanagerlogs.txt

Environment Details Client's version: Visual studio 2019 Professional v16.9.2 Operating System: Windows 10 Pro

Additional context Non isolation mode works correctly

daniv-msft commented 3 years ago

Thank you for reporting this issue, and sorry you encountered it! To help us understand what happened, could you please:

Reproduce the issue.
In a command prompt, run kubectl get pods to get the name of the pod corresponding to the routing manager (should be similar to routingmanager-deployment-774b56c749-577vg).
Get the logs from this pod and store them in a file: kubectl logs routingmanager-deployment-774b56c749-577vg > routingmanagerlogs.txt
Attach this file to this issue.

Thanks!

l0stmylife commented 3 years ago

Hi Daniv I have updated the report with the routing maanger logs. Looking over the logs myself there seems to be a problem with to many ingresses.

I can tell there are in fact a lot of old bridge to kubernetes ingresses that seem to not have been deleted.

daniv-msft commented 3 years ago

Thanks for providing the routingmanager logs! The issue you saw in logs related to Ingresses was due to our latest release yesterday. We released a mitigation as this impacted other users as well (https://github.com/microsoft/mindaro/issues/137). However, I suspect that the initial issue you encountered might have been different, as it appeared before the release. Could you please reproduce the issue and retrieve the logs for the routing manager again?

Thanks for your help understanding this!

l0stmylife commented 3 years ago

Hi Daniv.

It seems the latest update fixed the issue and brdige is working correctly again and we can no longer reproduce the issue.

We still have a lot of ingress resources that are left over and they don't seem to disapear after ending a debugging session. Is it safe for me to manually delete these?

daniv-msft commented 3 years ago

Thanks for your reply! We are indeed aware of a bug where cloned ingresses remain in the cluster when the routing manager encounters a crash. It's safe to remove these ingresses when you don't debug anymore. We are working on improving our cleanup mechanism to fix this.

l0stmylife commented 3 years ago

Hello @daniv-msft we are once again experiencing this issue.

I'm attaching two log folders from two engineers and the pod logs as well.

routingmanagerlogs.txt B2K Logs.zip B2K logs.zip

amsoedal commented 3 years ago

@l0stmylife thank you for reporting this. We'll need the subject matter expert to have a look when she gets online later today. One of us will reply on this thread as soon as we have an update. Thanks!

daniv-msft commented 3 years ago

@l0stmylife We have released the fix for this. As this is a component on our side, you won't need to update your binaries to get the fix. Could you please validate it works for you now? Apologies for the issue.

l0stmylife commented 3 years ago

@daniv-msft Hey Daniv it seems this issue is still happening.

daniv-msft commented 3 years ago

Thanks @l0stmylife, this one is trickier than we expected. We're looking into this, and will report on this issue when we have a fix. Adding @pragyamehta who is looking into this on our side.

pragyamehta commented 3 years ago

Hi @l0stmylife can you try the scenario again?

l0stmylife commented 3 years ago

Hi @pragyamehta the issue seems resolved on our end now.