opensearch-project / opensearch-metrics

OpenSearch Metrics
https://metrics.opensearch.org
Apache License 2.0
3 stars 4 forks source link

Build automation to verify/manage drift across repositories #64

Open dblock opened 3 years ago

dblock commented 3 years ago

Is your feature request related to a problem? Please describe.

We currently rely on opening GitHub issues and manual intervention to align repositories across this organization for such things as copyright notices. When these change, such as in https://github.com/opensearch-project/.github/pull/24, there are no guarantees that everybody applies the changes in a timely manner.

Describe the solution you'd like

A mechanism that evaluates drift across repositories for such things as common files and opens GitHub issues automatically. This mechanism would support common types of things: files must match 100% (e.g. code of conduct), files must contain text (e.g. copyright notice as is), files must be present (e.g. developer_guide), TOC must be present in README, etc.

vchrombie commented 3 years ago

Hi @dblock, I had some solution in mind but I forgot to post over here last month.

I will try to explain it in different steps.

The implementation of the below explanation should go in the opensearch-project/.github repository.

  1. Make sure all the repos have the same content of the common files (code-of-conduct, license, opensearch.svg, etc.)
  2. Make a github action which gets triggered when there is a push event to such common files.
  3. Whenever it is triggered, it should open an issue in all the repositories with the maintainers assigned or commented atleast. Still need to evaluate if create-an-issue-in-all-repos can be used.
  4. We can use JasonEtco/create-an-issue but can't think of how to assign different maintainers. (We can have some yml file which has the repository name and maintainers name and some script which does the job.).
    - opensearch:
    - name: "OpenSearch"
    - maintainers:
    - name: "dblock"
    - name: "CEHENKLE"
    - opensearch-py:
    - name: "opensearch-py"
    - maintainers:
    - name: "rushiagr"

A mechanism that evaluates drift across repositories for such things as common files and opens GitHub issues automatically.

This would target this problem.

We can extend this a bit to target the second problem. (optional)

This mechanism would support common types of things: files must match 100% (e.g. code of conduct), files must contain text (e.g. copyright notice as is), files must be present (e.g. developer_guide), TOC must be present in README, etc.

The implementation of the below explanation should go in the all the individual repositories (example opensearch-project/opensearch-py).

  1. In step-2 of the previous solution, we need to configure so that it adds a label (some unique label maybe mismatch) too while opening the issue.
  2. The individual repositories (opensearch-project/opensearch-py) should have a github action which gets triggered when an issue is opened with a particular label (here mismatch).
  3. The action should clone both the repositories opensearch-project/.github and opensearch-project/opensearch-py, and check the difference between files using diff command.
  4. I think there is a way to add the required changes using diff command itself. https://www.computerhope.com/unix/udiff.htm
  5. Once the file is updated, we can open a PR using peter-evans/create-pull-request to the same repository.
  6. If possible, assign the maintainers or comment their names.

Sorry, but I haven't worked on the whole solution. These are mere hacks when I was thinking about the issue. There might be some barriers which I might have missed.

Please let me know what do you all think. I'm willing to work if someone is willing to help/guide and reviewing the PRs along the way.

Thanks.

Best, Venu

dblock commented 3 years ago

The .github repo seems right, however it's also a template repo, so I think we don't want it to inherit the check jobs every time someone creates a repo in the org. My preference would be to put this in https://github.com/opensearch-project/project-meta which was created for exactly the purpose of manipulating all repos.

I would just kick off the workflow on a cron from project-meta, matrix-spawn jobs to check each repo after enumerating them like here and then performing the checks.

I don't think we need to care about details such as assigning maintainers. Seems involved. As long as we open issues in the right repos with an untriaged label, the team(s) will do the assignment of issues. And generally when we open issues referencing a parent issue, the assignee of the parent can go and ping owners in individual repos as needed.

Interesting you mention https://github.com/JasonEtco/create-an-issue, I just made 2 PRs into it to support https://github.com/opensearch-project/opensearch-build/pull/531 (https://github.com/JasonEtco/create-an-issue/pull/112 and https://github.com/JasonEtco/create-an-issue/pull/111). The code is pretty straightforward so we can keep extending. At the very least do open feature requests in that repo so we don't lose track.

I think it'd be great to start with something super simple like "repo has a README" and "repo has the correct SVG in the README". We could start authoring these checks in dumb bash or, probably better, Python. Are you planning to take a stab at this? I would be more than happy to review PRs and help out!

vchrombie commented 1 year ago

Are you planning to take a stab at this? I would be more than happy to review PRs and help out!

I didn't have enough time to put in during that time, but I would like to take a shot now!

I will be working on this if this is open.

@dblock @bbarani

gaiksaya commented 2 months ago

Moving this to metrics repo categorizing it under project health.

dblock commented 2 months ago

[Catch All Triage - 1, 2, 3, 4, 5]

prudhvigodithi commented 1 month ago

@dblock is this issue still valid, do we need to create a tool similar to audit ?

peterzhuamazon commented 1 week ago

We can use the automation app to handle a lot of these comparisons over time.