ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.89k stars 5.76k forks source link

[autoscaler][aws] cloudwatch alarm config placeholders are not replaced #46686

Open achsvg opened 3 months ago

achsvg commented 3 months ago

What happened + What you expected to happen

Placeholders like instance_id in the cloudwatch alarm config are not replaced.

I sent a PR here but looks like it fell through the cracks.

Versions / Dependencies

python 3.11.x ray 2.30.0

Reproduction script

[
    {
       "EvaluationPeriods":1,
       "ComparisonOperator":"GreaterThanThreshold",
       "AlarmActions":[
          "arn:aws:sns:us-west-2:xxx:yyy"
       ],
       "Namespace":"ray-CWAgent-{cluster_name}",
       "AlarmDescription":"Memory used exceeds 90 percent for 5 minutes",
       "Period":300,
       "Threshold":90.0,
       "AlarmName":"{cluster_name} high mem_used_percent {instance_id}",
       "Dimensions":[
          {
             "Name":"InstanceId",
             "Value":"{instance_id}"
          }
       ],
       "Statistic":"Average",
       "InsufficientDataActions":[

       ],
       "OKActions":[

       ],
       "ActionsEnabled":true,
       "MetricName":"mem_used_percent"
    }
]

Issue Severity

High: It blocks me from completing my task.

anyscalesam commented 2 months ago

thanks ... add some labels we'll take a look at it. feels pretty straightforward.

Zyiqin-Miranda commented 2 months ago

Hi @achsvg, could you also share your Ray cluster YAML config file?

Zyiqin-Miranda commented 1 month ago

Hi @achsvg, added my comment in this PR