Hi @philschmid, I want to use this terraform, however, in my use case I need to deploy falcon40 as an async endpoint with a scaling policy based on the "HasBacklogWithoutCapacity" metric.
To solve it for my use case I could replace predefined_metric_type with a var and set it to a default value of "SageMakerVariantInvocationsPerInstance" (I think this is the default in AWS as well).
Do you find it helpful for your repo as well? If not I will fork it and change it only on my repo.
By the way, just wanted to point out that as an ML Engineer using HF and AWS, I see a lot of your content and find it very useful! Thanks for the effort and keep up the great work you are doing!
Hi @philschmid, I want to use this terraform, however, in my use case I need to deploy falcon40 as an async endpoint with a scaling policy based on the "HasBacklogWithoutCapacity" metric.
In the code implementation at main.tf:
To solve it for my use case I could replace predefined_metric_type with a var and set it to a default value of "SageMakerVariantInvocationsPerInstance" (I think this is the default in AWS as well).
Do you find it helpful for your repo as well? If not I will fork it and change it only on my repo.
By the way, just wanted to point out that as an ML Engineer using HF and AWS, I see a lot of your content and find it very useful! Thanks for the effort and keep up the great work you are doing!