open-source-ideas / ideas

šŸ’” Looking for inspiration for your next open source project? Or perhaps you've got a brilliant idea you can't wait to share with others? Open Source Ideas is a community built specifically for this! šŸ‘‹
6.52k stars 222 forks source link

Repo template/boilerplate updater #262

Open KOLANICH opened 3 years ago

KOLANICH commented 3 years ago

Project description

I have a lot of python repos. They follow the same template/boilerplate.

  1. It'd be nice that on template update PRs are automatically sent to all the repos using it.
  2. It'd be nice to generate certain parts automatically.

Solution

  1. Repos detection. Though GH stores some internal metadata about templates from which a repo is generated, it does it for advertising purposes mostly:

    • there is no way to get all repos generated from a certain template
    • there is no way to mark a repo as generated from a certain template

    So, the solution is to put a file with some dedicated extension and an unique token and use advanced code search. This may be:

    • template repo name. Pro: human-readable, very meaningful. Con: IDK how well it interacts with GH.
    • hex of a hash of repo name. Pro: guaranteedly well interacts with GH search, since a single token. Con: non-human-readable
  2. Changes detection. Proper non-cooperative changes detection without general purpose AI feels like impossible. So have to use cooperative and restricted one. Template should be properly annotated in a machine-readable way. The algo should consider the template as a ... template and match against it the content of the repo. This will provide the tool with the info of the semantics of each block. Then each block is processed according to the semantics. It is generated from scratch every time, then the resulting repo is composed from the blocks, then a PR is sent to a repo if there are any diffs.

There are 4 types of blocks in each readme

  1. Header - project name. Enough = should be used for underlining.
  2. badges. The tool should detect the services available. Can be detected the 2 ways; for some services it is just checked if the service is available for the repo. For other ones the fact of appearance of a manually-added badge enables the service;
  3. text - kept intact.
  4. dependencies. Scans the manifests and detects dependencies. Adds them to the list. If any user text is added to a dep - it is kept. Also the ordering is preserved. For each dep its repo is scanned and badges are added.

and 2 universal ones for files like packages manifests

  1. package name
  2. repo name.
  3. package short description

On each template update a GitHub push-triggered action is called, that discovers repos with the template enabled, regenerates some its parts, checks if anything has changed, if there is - sends a PR. Which parts should be regenerated is configured in a config file .github/templater.toml.

The template usage assummes the following workflow:

  1. software author creates the software
  2. software author creates a GitHub Action installing and using the software
  3. template author creates a repo template.
    • he creates a file .github/.templateMarker;
    • the template must have a structure correctly recognized by the software
    • the template must contain GitHub Actions workflows:
      • validating the template on PRs and pushes. On succesful pushes it should detect if a repo is an a template using GH API, and if it is, search for repos inherited from this template and trigger them.
      • creating a PR in an own repo applying the most recent version of the template as a response to an external trigger. SOME SECURITY MAY BE NEEDED HERE.
  4. template instantiator creates a repo from that template, modifies it, keeping the structure assummed by the software.
  5. Template author modifies the template
  6. The actions are triggered in all the repos, causing them have PRs being sent modifying the repo contents according to the latest versions of the template.

Relevant Technology

Complexity and required time

Complexity

Categories

TheOtterlord commented 3 years ago

I just have a feeling that most of the time there will be merge conflicts that people can't be bothered to deal with. This is especially true if you have massively edited the original code

KOLANICH commented 3 years ago

@TheOtterlord

most of the time there will be merge conflicts that people can't be bothered to deal with. This is especially true if you have massively edited the original code

The software is meant to assumme that the ones used the templates preserve some structure that they define. And that the users' custom changes must fit into the template. There is no merge conflicts by definition - the patch is meant to be generated upon the repo main branch most recent commit, so all the merges of the PRs are fast-forwards.