zeitgeistpm / zeitgeist

An evolving blockchain for prediction markets and futarchy.
https://zeitgeist.pm
GNU General Public License v3.0
169 stars 40 forks source link

Process automation (overarching) #870

Open sea212 opened 1 year ago

sea212 commented 1 year ago

We have to automate some processes to save time:

maltekliemann commented 1 year ago

@sea212 Regarding license headers. Suppose that I'm working on a PR in 2022 and 2023. In the end, the PR is going to be squashed into a single commit in 2023. But some of the files have only been edited in 2022. Should they get the 2022 copyright or the 2023 copyright.

If you count the commit to main as "publication", then it should be 2023. But that makes keeping track of things a little harder, because you can't go by when a file was last edited. I guess the best we can do when checking licenses is assume that the current feature branch will get squashed into main. In most cases, that should work.

One of the cases where it doesn't work is when the PR is finished in 2022 but merged into main in 2023. I don't know, but with the copyright pattern that we're using, I think maybe just making it part of the review process to check the licenses is better than relying on some cobbled-together heuristic that's bound to fail in some weird special case.

maltekliemann commented 1 year ago

Here's another fun one: You have files, one with copyright 2021-2022, and one with 2022-2023. You move a function from the former file to the latter file. How is the license checker supposed to know to extend the copyright of the latter file to 2021-2023?

Furthermore, it's not even clear if the copyright should be extended. If the function I'm moving from file to file was added in 2022, then the copyright of the latter file should not be extended to 2021-2023.

So I guess my verdict is that this task cannot be automated in a satisfactory way. I suggest to do one of the following:

sea212 commented 1 year ago

I am not sure if it makes more sense to increase the copyright to when the file was edited or when it was pushed to main. Usually the file is changed on the main branch when it's merged and deleted together with the branch where it was modified, so I tend toward keeping track of when it was changed on main. However, I think what makes sense is to have a combined approach: We enforce updating copyright notices in the pull request, but in addition to that we create a changelog of the whole repository when a new year starts and update any copyright notices that are deprecated. I think this is a good middleground. Other projects like Substrate (copyright PR) even skip enforcing the update in a PR and just do it once per year for every file that changed.

In the latter example, when copying a function from one file to another the copyright year should not extend into the past; As a matter of fact the file did potentially even not exist at that time and therefore it can't be protected by a copyright. It seems plausible to me when thinking about using code from the same organization, however when using external code this rule changes. The copyright (from another organization) might extend into a time when the code did not exist, however this is enforced by the copyright itself: Most of the time it enforces to provide a copy of itself in the code that uses parts of the copyrighted file.

maltekliemann commented 1 year ago

So, to summarize, we will have two mechanisms:

The primary purpose of the yearly check is to catch errors, I guess?

sea212 commented 1 year ago

Yes, I proposed both the convention to check in the PR and the yearly update to reduce human error in case of manual copyright adjustment. However, if we can use a script to automate that in every PR we don't need the yearly check imo.

Maybe that is useful: https://github.com/FantasticFiasco/action-update-license-year

maltekliemann commented 1 year ago

Yes, I proposed both the convention to check in the PR and the yearly update to reduce human error in case of manual copyright adjustment. However, if we can use a script to automate that in every PR we don't need the yearly check imo.

Okay, last comment then, just so we're on the same page. Been wondering what tech to use for this. I guess it's either Rust (with AST analysis) or Python (there's no good Rust AST analysis tool for Python, so we're relying more on heuristics to properly find the license comment). I'm going with Python because it seems like a massive hassle to do this in Rust.

We can definitely use that to automate a PR for the new year, but the PR should definitely be reviewed before being committed to main.

sea212 commented 1 year ago

I'd say we are free to use whatever is best suited for that task. I am pretty confident that there are already scripts out there that do this, so a little search before starting the implementation might save some time here. Just wondering how much time it takes to write this and how many years it will take until it pays of :smile: . I'd say it's also fine if the scripts just scans through every file with specific endings (like .rs) and checks if the current year can be found in it. We can then check if it was changed in the previous year (using Github compare, like https://github.com/zeitgeistpm/zeitgeist/compare/b6d3a3342d1687ae825e78f68c31625ae2a15723...75bcae6e21c15b6fbf91e3f8aee76adfe7f6bf9d). In that case, we just need one Unix-like OS command: find. In other words, I am also fine with a semi-automatic approach, keeping effort and gain in balance.

Chralt98 commented 7 months ago

In terms of integration tests:

Useful integration tests: