Feature Request: GPU support

microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps

MIT License

372 stars 29 forks source link

Feature Request: GPU support #673

Open marcindulak opened 1 year ago

marcindulak commented 1 year ago

Is your feature request related to a problem? Please describe.

GPU support for container based workloads, predictions using small machine learning models, e.g. a few GPUs and few dozens GBs of GPU memory or their fractions.

Describe the solution you'd like.

GPU with auto-scaling support, with support for scaling to zero.

Describe alternatives you've considered.

Azure Container Instances, but they don't scale to zero out-of-the-box https://blog.tomkerkhove.be/2021/01/02/autoscaling-azure-container-instances-with-azure-serverless/.
Azure Kubernetes Service, requires a complex setup for auto-scaling https://github.com/paolosalvatori/aks-gpu#keda and necessitates maintaining a running cluster.

Additional context.

None

bbrandt commented 1 year ago

Wow. I was just thinking how needed this is and happy to see the issue has already been opened. My case is that I have AKS configured in Central US and East US2. We just updated one of our Python calculation pipelines that uses Jax to support GPU. We were able to get capacity in East US2 for NCASv3_T4 (we don't need a ton of memory for our calc so others are overkill), but MS support says there are no NCASv3_T4 available in Central US.

Now we are faced with the decision, do we setup another difficult to maintain AKS cluster in South Central US where we were able to get NCASv3_T4 capacity or just let the calcs run slower on CPU nodes in Central US when our East US2 cluster can't handle all the load.

I was hoping there was a third option, just deploy our containers to Azure Container Apps in South Central US to run on GPU-enabled hardware. Looking forward to how much this will save us in maintenance spend.

fiedlerNr9 commented 1 year ago

Is there anything on the roadmap for Azure Container Apps regarding Gpu support? We love the service itself but are forced to switch to Azure Container Instances for now.

mattmazzola commented 11 months ago

I was also interested in GPU support for Container Apps. I recently saw this blog post which says they support dedicated workloads with A100 GPUs so I think this is supported now, although I haven't directly tested it myself

You can now create Azure Container Apps environments with NC A100 v4 GPU enabled compute in the West US 3 and North Europe Azure regions

https://techcommunity.microsoft.com/t5/apps-on-azure-blog/build-intelligent-apps-and-microservices-with-azure-container/ba-p/3982588

It also seems like they are trying to unify the Consumption and Workload modes so that Workload can have scale to 0 experience. I will likely try it out soon.

lehmus commented 11 months ago

Yes, Microsoft released preview support for GPU Container Apps in Ignite 2023. The cluster sizes start from 24 nodes, only A100 GPUs are supported and you have to make an application for Microsoft if you want to be included in the preview. Hopefully they will open up the preview soon and offer smaller clusters!

BC89 commented 1 month ago

Where does general availability of GPU sit on the roadmap and overall priority list? Seems like it's been in preview in a limited number of regions for quite some time now. Even if it stays in preview, can the number of regions be expanded. Specifically, we are interested in a US East option.