kyma-project / kyma

Kyma is an opinionated set of Kubernetes-based modular building blocks, including all necessary capabilities to develop and run enterprise-grade cloud-native applications.
https://kyma-project.io
Apache License 2.0
1.52k stars 405 forks source link

Enable consumption and configuration of specific hyperscaler resources [EPIC] #18195

Open varbanv opened 1 year ago

varbanv commented 1 year ago

Description

Provide a way for end users to consume and be charged for a pre-defined set of hyperscaler resources:

To have standard machine types configurable in worker pools gets addressed in separate story https://github.com/kyma-project/kyma/issues/18709. It is expected that any additional node specific settings will be added to that concept as further option

Context

Problem

Currently, Kyma is a layer on top of Kubernetes and as such provides a very limited set of infrastructure configuration options at provisioning time. However, customers looking to adopt Kyma that already use existing hyperscaler offerings already take advantage of specialized resources as part of their workloads (for example faster storage, GPU nodes, network optimized nodes, etc). This prevents those users from on-boarding on Kyma without having to re-engineer their workloads.

Benefits

For customers:

For us:

Potential problems

Gathering of Resources to support

Billing requirements

Acceptance criteria

Tasks

Disper commented 1 year ago
marco-porru commented 1 year ago

cKMS team is evaluating the usage of Kyma. The team need to have "confidential computing capabilities". This kind of machine is surely available for azure and gcp

marco-porru commented 11 months ago

SAP for Me would like to use m6g and m6in machine types

valentinvieriu commented 10 months ago

+1 for GPUs

marco-porru commented 10 months ago

+1 for GPUs

Thanks Valentin for reporting it. I think it's worth mentioning the context and let me do it on behalf of you for simplicity 😄 : it's to make it possible for Core AI to run on Kyma (subject to future discussions and agreements)

varbanv commented 8 months ago

Had a preliminary workshop with @tobiscr and @PK85 and added a first set of tasks to work on.

marco-porru commented 7 months ago

+1 team for GPU (ICN Munich)

marco-porru commented 7 months ago

Enable more private connectivity (e.g. via firewall), requested by not less than 3 teams (e.g. S/4HANA ABAP Machines)

marco-porru commented 7 months ago

Enable assured workload GCP module (relevant for KSA), requested by BTP email service

abbi-gaurav commented 7 months ago

A customer is looking for very high IOPS storage. e.g. enabling ultra disks for storage could help them: https://learn.microsoft.com/en-us/azure/virtual-machines/disks-enable-ultra-ssd?tabs=azure-portal

abbi-gaurav commented 6 months ago

At present, customers are able to use resources for which they are not charged such as

We should somehow make the customers aware that they might have to pay for this in the future, so it should not come as a surprise for them.

@NHingerl , could you please help? IMHO, putting this info out might not need to wait until this epic is done.

lanthoor commented 5 months ago

SAP IPR would like to use g5 and r7i instance types along with other hyperscaler resources like ALB/NLB.

pthd commented 5 months ago

+1 for GPU support. AI scenarios required GPU powered instances. More precisely we want to leverage Transformer models which run much faster on GPU.

MarcusNotheis commented 5 months ago

We would be interested in OpenSearch consumption

marco-porru commented 4 months ago

GPUs for the Product Services team (already LIVE) In particular from GCP A100 H100 H200 machines

marco-porru commented 4 months ago

GPUs requested also by NGS (already live)

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

a-thaler commented 1 month ago

In the default worker pool we will continue to support the current machine types only. With additional worker pools we will support additional machine types to be used. As soon as the worker pool feature is ready (https://github.com/kyma-project/kyma/issues/18709), we will start adding some compute-intensive types followed then by GPUs.

In parallel we are already working on a concept to emit also non-billable metrics, bringing more transparency on what actually gets charged. That is still in a conceptual phase still identifying what is possible to achieve.

For the compute-intensive workloads we are currently thinking to add these ones (non-ARM based):

cruschke commented 2 weeks ago

Signavio would be interested in c5.9xlarge or newer generations, and r6i.8xlarge and r7i.24xlarge.

mbhagdev commented 1 day ago

We use the g4dn.2xlarge from aws and g2-standard-8 from GCP in gardener for our deployments. Would be nice if these were available in kyma for us to migrate to kyma.