microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
359 stars 29 forks source link

Blazor Server and SignalR deployed on Azure Container App fails with WebSocket connection error #252

Open apsthisdev opened 2 years ago

apsthisdev commented 2 years ago

Please provide us with the following information:

This issue is a: (mark with an x)

Issue description

Unable to configure Blazor Server app deployed on the Azure Container App service

Steps to reproduct

  1. Create a standard Blazor Server App with VS2022 with Docker Integration
  2. Configure the Docker file.
  3. Create and Configure Azure SignalR Service for the Blazor server App (Add Connected Service)
  4. Deploy the Blazor Server App on Azure Container App Service
  5. Deploy is successful and I can load the Blazor server from the managed Azure Containerized environment.
  6. Web-socket connection error

On Azure Container App environment, I get the following error

image

Expected behavior [What you expected to happen.]

  1. Blazor server deployed and running on Azure Container App without any error
  2. A clear concise description of how to configure Blazor Server and SignalR with Azure Container App
dariagrigoriu commented 2 years ago

Please clarify the value of scale settings, specifically minReplicas and maxReplicas.

apsthisdev commented 2 years ago

@dariagrigoriu here is my config

image

image

ivarne commented 2 years ago

I’ve been having a similar issue (not using SignalR service as I’m creating an admin tool with just a few users and not every day). I guessed it was an issue with scaling to 0, and when I set minReplica to 1 everything seems to work.

I guess the issue is that the scaling logic looks at when the last connection was opened, and a long running web socket for blazer makes the auto scaler think the container is inactive. If I could configure container apps to wait a few hours before scaling down, I think that could solve the issue.

anthonychu commented 2 years ago

It like this happens when the client connects to a different Container Apps replica than the one that holds the Blazor circuit it needs to connect to. Container Apps doesn't support sticky sessions today, so if the revision scales above 1 replica, this error will occur if the client connects to a different replica. For now, set both minReplicas and maxReplicas to 1.

apsthisdev commented 2 years ago

@anthonychu , thanks for the input. I have a few questions related to Blazor Server, Azure SignalR, and Azure Container Apps. Can you please provide your comments? We are planning a production app with this config, and I want to ensure it works in principle.

Is this understanding correct ?

client connects to a different Container Apps replica than the one that holds the Blazor circuit it needs to connect to Why would this issue arise in my case? There is a Blazor Server Web App with a dedicated Azure container app and dedicated SignalR service. This config is not shared with any other app.

Sticky Session not supported? But If I use Azure SignalR then the client affinity is not required because client is immediately redirected to the Azure SignalR Service when they connect. that's what the Microsoft docs recommends. What am I missing here ?

What is recommented settings for AzureSignalR with Blazor Server deployed on container app ? image

edmundmunday commented 2 years ago

@ameyasubhedar - have you tried forcing the Container App to use http1.1 transport protocol? The default configuration in Container Apps is "auto", which seems to push it to http2. This seems to break Websockets implementations (I suspect because of something to do with https://developer.mozilla.org/en-US/docs/Web/HTTP/Protocol_upgrade_mechanism ).

In your template, under ingress, force the transport to use HTTP (https://developer.mozilla.org/en-US/docs/Web/HTTP/Protocol_upgrade_mechanism) - this resolved the issue for us.

bradygaster commented 1 year ago

Could you possibly scale it to 1/1, so it's ALWAYS running one instance and see if that fixes it? I believe this is related to data protection, and how when you're in ACA/AKS/anything-horizontally-scaling, you'll probably need to manage your keys securely. There are a few ways to resolve this. Are you able to change the configuration around somewhat, starting with "forcing it to 1 instance" to see if the issue is resolved, then moving into using some distributed data protection?

dss539 commented 1 year ago

Could you possibly scale it to 1/1, so it's ALWAYS running one instance and see if that fixes it? I believe this is related to data protection, and how when you're in ACA/AKS/anything-horizontally-scaling, you'll probably need to manage your keys securely. There are a few ways to resolve this. Are you able to change the configuration around somewhat, starting with "forcing it to 1 instance" to see if the issue is resolved, then moving into using some distributed data protection?

1/1 min/max worked around the Blazor SignalR issues for me. As you can guess, that's not a great solution, but it is a workaround for small scale apps. Since I'm merely evaluating the ACA service right now, I haven't put in any effort to set up Azure SignalR to see if that can help things.

This websocket problem should be very easy to repro by deploying nearly any blazor server app in ACA with min replicas set to 2+. I'd be glad to stand up a simple demo for you if it would help.

bradygaster commented 1 year ago

Yeah that's an "unfortunate fix," I admit. I do have a solution for you that'd enable you to scale out your front end without concern or issue, though. Do you already use Key Vault and Azure Storage in your topology? If so it should be simple, otherwise you'd probably need to add those to your topology to try out the solution I'm proposing.

dss539 commented 1 year ago

Yeah that's an "unfortunate fix," I admit. I do have a solution for you that'd enable you to scale out your front end without concern or issue, though. Do you already use Key Vault and Azure Storage in your topology? If so it should be simple, otherwise you'd probably need to add those to your topology to try out the solution I'm proposing.

Thanks for your offer. Since I'm just experimenting with ACA for the moment, I don't need a better workaround. Due to curiosity, I still would be interested to know the basic approach you envision, but I won't be implementing it.

yoDon commented 1 year ago

Do you already use Key Vault and Azure Storage in your topology? If so it should be simple, otherwise you'd probably need to add those to your topology to try out the solution I'm proposing.

@bradygaster any chance you could share your thoughts on using SignalR with low-usage containers? I'm sure I'm not the only one who will find this thread and hope to know more.

bradygaster commented 1 year ago

Not sure what you mean, @yoDon. What sort of guidance?

yoDon commented 1 year ago

@bradygaster you were talking a couple messages up about a way to host SignalR in low-usage Azure Container Apps by doing something with Key Vault and Azure Storage. That sounded like a good tip, and I for one couldn't figure out how to use Key Vault and Azure Storage to help run SignalR in a low container usage context.


From: Brady Gaster @.> Sent: Tuesday, September 6, 2022 7:24:29 PM To: microsoft/azure-container-apps @.> Cc: Don Alvarez @.>; Mention @.> Subject: Re: [microsoft/azure-container-apps] Blazor Server and SignalR deployed on Azure Container App fails with WebSocket connection error (Issue #252)

Not sure what you mean, @yoDonhttps://github.com/yoDon. What sort of guidance?

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/azure-container-apps/issues/252#issuecomment-1238745236, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AANN3QOX2OLA2J5BRYKQGOTV47HC3ANCNFSM5XWMCHHQ. You are receiving this because you were mentioned.Message ID: @.***>

bradygaster commented 1 year ago

@yoDon my comment was less about the SignalR part. ASP.NET's data protection was my concern. With data protection in a containerized, dynamically-scaled environment, you need to put your key somewhere all the servers/containers can see it - hence, blob storage. To be secure with those keys, you need to encrypt them at rest, hence Key Vault. Using the two together in a scaled-out environment is advisable, since so much sits on top of data protection.

cwe1ss commented 1 year ago

@yoDon my comment was less about the SignalR part. ASP.NET's data protection was my concern. With data protection in a containerized, dynamically-scaled environment, you need to put your key somewhere all the servers/containers can see it - hence, blob storage. To be secure with those keys, you need to encrypt them at rest, hence Key Vault. Using the two together in a scaled-out environment is advisable, since so much sits on top of data protection.

fyi: As setting this up requires quite a few steps, I have implemented this in my Azure Container Apps-based microservice template: https://github.com/cwe1ss/msa-template

Feel free to start a discussion or raise an issue in my repo if you have any further specific questions.

yoDon commented 1 year ago

@cwe1ss (and possibly @bradygaster) I suspect I'm currently deploying a lower complexity infrastructure than you are, but thus far in my research I think I've had reasonable success with the following GitHub-centric approach to ACA CI/CD secrets management (which presumably could be replicated in a Jenkins or CircleCI or Azure DevOps or etc. pipeline without a lot of additional complexity by someone who knows and uses those products):

 - And potentially also pass the secret contents into another bicep file referenced by the outer module

module scaler 'child.bicep' = { ... params: { ... someConnectionString: someConnectionString ... } }

 - For Local testing (eg. running in a Visual Studio debugger on your local machine not in the cloud) the relevant env vars can be populated using an appsettings.Development.json that you don't check into source control that contains something like

{ ... "SomeConnectionString": "...", ... }


From what I've been able to spot, Azure's Bicep handling and GitHub's Action handling seem to be sufficiently GH-secret-aware and Bicep-secure-param-aware that this approach keeps the secrets out of the GitHub Action build/deploy logs and also keeps the secrets from getting captured in the Azure Container App JSON template env section that can be viewed in the Azure Portal or queried via the azure cli (the current `https://github.com/Azure-Samples/Orleans-Cluster-on-Azure-Container-Apps` sample has this problem of leaving some secrets visible in that template env descriptor).  

As a quick aside to anyone who is trying to implement this approach, when the GH Action runs the bicep code and creates the container apps, the GH Action log of the bicep execution reports empty values for the values of the environment variables containing the secrets, for example the logs will report something like

env: [ ... { name: 'SOME_CONNECTION_STRING' value: '' secretRef: 'some-connection-string' } ... ]


When I first saw those empty values in the logs, I was concerned the method wasn't working, but I'm guessing it's just a sign of some point in the code where a secret value is being incorrectly masked as `''` rather than `'***'`, or it might just be some other subtlety of bicep syntax that I'm not personally aware of, but either way the reported empty values don't seem to be a sign of a problem.

One possible objection I can see to this approach I'm taking, beyond the obvious GH-dependence, is that it does make the secrets available to the processes running inside the containers in the form of environment variables. Some folks feel strongly in favor of providing secrets as env vars, others dislike that approach. 

If there are other kinds of issues you see with this approach, either generally or in the context of a larger, more complex infrastructure deployment, I'm super interested to dig deeper into this as I'm just starting to wrap my head around ACA architectures and ACA deployments and secret management in a CI/CD pipeline is obviously important stuff to get right.
cwe1ss commented 1 year ago

I personally like to have my secrets as close to the target as possible. Moving them from GitHub to Azure via parameters and relying on the log parser to catch all cases would be too risky for me.

What kind of secrets do you have?

If it's for accessing services in Azure (like SQL DB, Key Vault, Storage, ...), the best way would be to use managed identities and RBAC-based role assignments. This way you don't need any secrets. There's many examples of this in my template - even for running SQL migrations during deployment.

If it's for Azure services that don't support managed identities, or if you can't use managed identities for some reason, you could set the secrets within your Bicep templates by referencing existing resources. You can see this in my template, where I need to pass a Azure Service Bus connection string to a Dapr-component (which doesn't support managed identities yet):

https://github.com/cwe1ss/msa-template/blob/355f6b1dd3a9dc6dd6c1d4ddf02f3df858c99ec4/infrastructure/environment/app-environment.bicep#L60-L63

https://github.com/cwe1ss/msa-template/blob/355f6b1dd3a9dc6dd6c1d4ddf02f3df858c99ec4/infrastructure/environment/app-environment.bicep#L103-L108

If it's secrets for 3rd party systems, I'd store them in Key Vault / App Configuration. With ASP.NET Core's IConfiguration system, you'd still get all the benefits of using some other source locally, etc.

dss539 commented 1 year ago

It is extremely unclear to me why this discussion is happening here. It appears that people have come in and completely hijacked this github issue to talk about their favorite topic (secrets management) which has nothing to do with Blazor, SignalR, or websockets. However, I am fully aware that I may just be dense as a rock, so I am not understanding the connection.

Can someone please explain how this discussion on secrets has any bearing whatsoever on Blazor, SignalR, or websockets in ACA?

cwe1ss commented 1 year ago

Yes, this went off-topic. Sorry. I was trying to help with the previously mentioned setup of Data Protection.

dss539 commented 1 year ago

Yes, this went off-topic. Sorry. I was trying to help with the previously mentioned setup of Data Protection.

No worries; it happens sometimes. At least I know I'm not totally crazy. 😄

yoDon commented 1 year ago

Back to the SignalR connection error, I originally found this issue while trying to fix a connection error involving SignalR and Orleans on ACA. Everything worked great running locally but I was hitting connection errors when I deployed to ACA. In case it helps anyone else who finds this thread, my problem turned to be a result of hosting Orleans Dashboard in the silo, which meant I had port 8080 exposed in my Azure Container App (so I could access the Orleans Dashboard). SignalR defaults to port 80/443, so external clients couldn't reach the SignalR hub that was hosted in the silo. I'm sure I could have fixed it by configuring the ports for the Dashboard and SignalR while keeping both hosted in the silo, but instead I simply moved the SignalR hub to another container.

marinasundstrom commented 1 year ago

A bit late but. Checking this helps for web sockets:

Screenshot 2023-07-23 at 23 32 06

roklenardic commented 4 months ago

While Session affinity does solve the issue, it renders the case of ACA useless. If you turn on session affinity, horizontal scaling just does not work very well anymore. Try and do some load testing to find out the first replicas get overloaded where latter ones are doing only a bit of work.

So I wonder if someone came across another solution that does not involve setting up a dedicated SignalR server?