philips-labs / terraform-aws-github-runner

Terraform module for scalable GitHub action runners on AWS
https://philips-labs.github.io/terraform-aws-github-runner/
MIT License
2.62k stars 627 forks source link

Monitoring, metrics and alerting #2025

Open ScottGuymer opened 2 years ago

ScottGuymer commented 2 years ago

We are seeing increasing interest and need for a solution to help monitor the resources deployed by this module.

There have already been a number of interactions around this

And some PRs

This is also something we are interested on at Philips and want to create a solution that is useful for the community where needed.

We will share more info on this as we align further.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

razor54 commented 2 years ago

In addition to this, would cpu and memory monitoring and alerting through cloudwatch be of interest?

sgametrio commented 8 months ago

The possibility of having metrics (alerts too?) would be so cool! 👍

mackobi commented 4 months ago

Guys, do you expect soon get this implemented into main?

dgokcin commented 1 month ago

@razor54 just being able to see memory metrics would be enough for me. ATM, I am having an issue with the runners staying hanged randomly and the deafult metrics for ec2 instances is not helping at all for me to troubleshoot the issue. I tried configuring the cloudwatch agent by creating my own template file but it did not work. Any tips?

{
    "agent": {
        "metrics_collection_interval": 5
    },
    "metrics": {
        "metrics_collected": {
            "swap": {
                "measurement": [
                    "swap_used_percent",
                    "swap_used",
                    "swap_free"
                ]
            },
            "mem": {
                "measurement": [
                    "mem_cached",
                    "mem_total",
                    "mem_used"
                ]
            }
        }
    },
    "logs": {
        "logs_collected": {
            "files": {
                "collect_list": ${logfiles}
            }
        }
    }
}