[FEATURE] Ingest Maintainer last engaged date into Metrics cluster

Is your feature request related to a problem?

Coming from https://github.com/opensearch-project/opensearch-metrics/issues/57

As a prerequisite for https://github.com/opensearch-project/opensearch-metrics/issues/73 and https://github.com/opensearch-project/automation-app/issues/8, there needs to be data in the Metrics cluster with information about each maintainers' repo, name, affiliation, the date they were last engaged, and their inactivity status.

What solution would you like?

An index created in the Metrics OpenSearch cluster called maintainer_engagement, which will have documents with this structure:

{
    "id": "8baa664c-dec0-4201-b4b9-9747c2e7ee45",
    "repository": "opensearch-metrics",
    "name": "Brandon Shien",
    "github_login": "bshien",
    "affiliation": "Amazon",
    "event_type": "issues",
    "event_action": "opened",
    "time_last_engaged": "2024-08-27T00:31:56Z",
    "inactive": false
}

To create these documents, there should be a lambda that will use the github-activity-events index(from: https://github.com/opensearch-project/opensearch-metrics/issues/76) to collect/calculate the required fields for each document and index these to the maintainer_engagement index.

This lambda should:

Scrape the MAINTAINERS.md for each repository in the OpenSearch project, and create a mapping between repo and list of maintainers. This will yield the repo, name, github_login, and affiliation fields.
Iterate through the mappings to make a top hit query for the latest document on the github-activity-events index for each repo, maintainer, and event type.
Use the created_at field for each GitHub Event document to get the time_last_engaged
For each event type, calculate if the Maintainer should be considered active or inactive based on time_last_engaged and how active the repo is.

For the inactivity calculation, we can use a linear equation, y = m*x + b, where: x = the total number of events for a repo y = the amount of time a maintainer is inactive before we flag them as inactive

And we can calculate the slope(m) and the y-intercept(b) with two points: (# of events in the repo with the least events, higher bound time to wait(365 days)) (# of events in the repo with the most events, lower bound time to wait(90 days))

This way we have an equation to calculate how long to wait for each repo, we wait longer on repos that are less active, wait shorter on repos that are more active.

Aggregate all event types to a single document which will definitively say whether a maintainer is inactive.
For each event type and the aggregate event, index these documents to the maintainer_engagement index.

Do you have any additional context?

https://github.com/opensearch-project/opensearch-metrics/issues/57

opensearch-project / opensearch-metrics