opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.88k stars 1.84k forks source link

[PROPOSAL] Author CHANGELOG in each PR instead of collecting them in the last days before a release #1868

Closed dblock closed 1 year ago

dblock commented 2 years ago

What kind of business use case are you trying to solve? What are your requirements?

Release notes are of varying quality.

For example, https://github.com/opensearch-project/OpenSearch/pull/2489 is completely unreadable.

What is the problem? What is preventing you from meeting the requirements?

What are you proposing? What do you suggest we do to solve the problem or improve the existing situation?

  1. Replace authoring release notes at the end before a release with authoring release notes in each PR.
  2. Require that release notes be updated via something like danger-changelog with every change.
  3. Automate collection of release notes across all projects. https://github.com/opensearch-project/opensearch-build/issues/438 https://github.com/opensearch-project/opensearch-build/issues/698
dblock commented 2 years ago

cc: @elfisher

stockholmux commented 2 years ago

I see how this addresses Release notes are only available at release time, which I like, but I'm failing to understand how it address the other two points.

It seems like the output of danger-changelog is not too dissimilar to the existing release notes (last good example - subsequent 1.2.x are all log4j fixes), except without the context provided by the categories.

I will add that the release notes for a release do go through a human spot check. Often times that spot check will catch things like incorrect terminology, typos, or inconsistencies. Does the script driven approach allow for this?

dblock commented 2 years ago

It seems like the output of danger-changelog is not too dissimilar to the existing release notes (last good example - subsequent 1.2.x are all log4j fixes), except without the context provided by the categories.

danger-changelog just checks that the PR includes a well-formatted entry for the changelog, it doesn't actually write it. The change I am proposing is that humans are required to author the changelog line and it gets reviewed as part of the PR

I will add that the release notes for a release do go through a human spot check. Often times that spot check will catch things like incorrect terminology, typos, or inconsistencies. Does the script driven approach allow for this?

It doens't. I am proposing to remove any script-driven approach completely and move the authoring of release notes/changelog to the PRs, sorry if this wasn't clear. I've edited the issue and removed the CHANGELOG part, it just muddies the waters

elfisher commented 2 years ago

Would there be an expectation as part of the code review to review the release notes in the PR as well? That might help with quality.

dblock commented 2 years ago

Would there be an expectation as part of the code review to review the release notes in the PR as well? That might help with quality.

Yes.

peternied commented 2 years ago

How about a format of the CHANGELOG that allows for cross checked against commits since last release? This allows maintainers or contributors to add entries as needed or independently clean up the notes. This avoids adding a barrier to contributions by adding another criteria on the PR to merge.

As part of health of the repo a tool can verify how many 'undocumented' items are not in the CHANGELOG. This can be integrated into the release process during the post-code-freeze - prerelease time it can be required that maintainers drive that number to zero. This allows for time to group similar changes and have dialogs about terminology/presentation.

dblock commented 2 years ago

@peternied Why would we need that if every PR must have a CHANGELOG entry?

peternied commented 2 years ago

@dblock You are right, I am not thinking of chronological account of the changes in a codebase.

I want a reference of Most important -> Least important changes since last release. Here is a contrived example of what I'd prefer vs changelog.


Release Clatu Verata Nictu

Breaking

Release Clatu Verata Nictu

dblock commented 2 years ago

@peternied We would adopt a standard, such as https://keepachangelog.com/en/1.0.0/

peternied commented 2 years ago

@peternied We would adopt a standard, such as https://keepachangelog.com/en/1.0.0/

That standard format looks great, I am sold.

Types of changes

  • Added for new features.
  • Changed for changes in existing functionality.
  • Deprecated for soon-to-be removed features.
  • Removed for now removed features.
  • Fixed for any bug fixes.
  • Security in case of vulnerabilities.
getsaurabh02 commented 2 years ago

Thanks @dblock for the proposal. Another key area would be to standardize the various buckets or groups created for the release notes entries such as Enhancements, Bug fixes, Maintenance, Refactoring, Infrastructure etc. These are pretty insightful for readers to extract focussed information. Some of the recent release notes have these :

Do we expect the CHANGELOG to have these groups pre-defined, that will allow authors/reviewers to add/review the CHANGELOG entry, added under the right group, along with every PR?

dblock commented 2 years ago

Do we expect the CHANGELOG to have these groups pre-defined, that will allow authors/reviewers to add/review the CHANGELOG entry, added under the right group, along with every PR?

Yes, I expect those to be exactly from the list in https://keepachangelog.com/en/1.0.0/, at least to start.

kartg commented 2 years ago

Require that release notes be updated via something like danger-changelog with every change.

I'm guessing a "change" here could consist of multiple commits, as described by keepachangelog.org:

The purpose of a commit is to document a step in the evolution of the source code. Some projects clean up commits, some don't. The purpose of a changelog entry is to document the noteworthy difference, often across multiple commits, to communicate them clearly to end users.

In the case of multiple commits that gradually build up to a full feature, @dblock is there a good mechanism we can put in place to make sure the changelog is reliably updated?

Alternatively, we could implements Conventional Commits as a precommit hook. That would give us a mechanism, but it still requires a changelog to be built and curated at release time. Thoughts?

dblock commented 2 years ago

@kartg I think we do simple: every PR requires a CHANGELOG update. Humans ensure quality in code reviews. Multiple commits can be merged in CHANGELOG as we make progress.

kartg commented 2 years ago

Works for me. In that case, we should define what makes a good Changelog entry - it shouldn't end up being just a repeat of the commit message.

peternied commented 2 years ago

@kartg @Poojita-Raj @dblock @getsaurabh02 @elfisher @stockholmux I've opened https://github.com/opensearch-project/security/pull/1821 - would love your thoughts on how this looks

rursprung commented 2 years ago

@kartg I think we do simple: every PR requires a CHANGELOG update. Humans ensure quality in code reviews. Multiple commits can be merged in CHANGELOG as we make progress.

from personal experience (we use this format in-house) i can tell you that, while this is a nice idea, it does result in a lot of merge conflicts (as they're all congesting on the same lines in CHANGELOG.md). i don't have a good solution for this though.

i do however 100% vote to follow Keep a Changelog for the actual changelog! i've added some comments to the draft

cliu123 commented 2 years ago

There've been the release drafter to generate draft release notes capturing all merged PRs. The release notes are generated by CI pipeline on each PR merge and avaialble for publishing releases. Do we still need this manually documented changelog?

This is a good idea that makes changelog available before release! It'd be perfect if it's done automatially instead of manually. It'd be very easy to forget to update CHANGELOG.md on each PR especially for new contributors who are not familiar with our develpment process. In comparison, the release drafter run in CI automatically documents the changes, which makes more sense.

dblock commented 2 years ago

@cliu123 I think the automation you propose would produce complete unreadable garbage right now between backport PRs, DCO messages, etc, at least on OpenSearch. So it would solve having release notes continuously available, but it wouldn't help making these notes useful. I could be wrong though, seems pretty cheap to try and enable it on this repo, want to try it? also how does it work across 3 ongoing releases in various branches?

kotwanikunal commented 2 years ago

After a dive into the discussion and release-drafter, it looks like release-drafter does not directly support multiple drafts on multiple branches. There is a workaround available which can possibly support multiple draft notes. The caveat here is that we need a config for every version/branch in our case which is not scalable. We will have to either put in efforts to automate this using the script which could run with the version bump script but can still cause problems due to misconfigured target_commitish and the usage of GitHub UI for release publish.

In the case we decide to fix all the issues with release-drafter or if we decide to move forward with a changelog based approach, it would be a great addition to start labelling PRs to classify them into appropriate buckets. I have opened a PR for adding a label checker to PRs - #3618

I am happy to chat and would love to move this discussion forward.

dblock commented 2 years ago

@kotwanikunal This proposal just says that the CHANGELOG would be edited by humans with every PR, thus you don't have to build any automation at all on branches to collect release notes, solves the problem you describe above

kotwanikunal commented 2 years ago

@kotwanikunal This proposal just says that the CHANGELOG would be edited by humans with every PR, thus you don't have to build any automation at all on branches to collect release notes, solves the problem you describe above

@cliu123 I think the automation you propose would produce complete unreadable garbage right now between backport PRs, DCO messages, etc, at least on OpenSearch. So it would solve having release notes continuously available, but it wouldn't help making these notes useful. I could be wrong though, seems pretty cheap to try and enable it on this repo, want to try it? also how does it work across 3 ongoing releases in various branches?

@dblock I was referring to the mechanism used by plugins mentioned in the previous comment. The release-drafter approach does not make it any easier for us in terms of maintenance. Currently its difficult to compile these notes as they are and as a path forward, I was proposing labelling PRs to make it easier to classify even with the CHANGELOG. What do you think about that?

dblock commented 2 years ago

I don't see the point in labelling PRs if the said PR must also include a 1-line change in the appropriate section of the CHANGELOG in progress. Am I missing a scenario where the label would be useful?

kotwanikunal commented 2 years ago

The proposal I had was something on the lines of -

I know the proposal originally required the requester to add the details to the CHANGELOG manually but I was trying to ease out that experience by going a bit further. With the original proposal, it will work more on the lines of an enforcer action than a generator. That should be an easy thing to add in if we are planning to go forward that

dblock commented 2 years ago

I know the proposal originally required the requester to add the details to the CHANGELOG manually but I was trying to ease out that experience by going a bit further. With the original proposal, it will work more on the lines of an enforcer action than a generator. That should be an easy thing to add in if we are planning to go forward that

I think these two things are fairly separate. In my experience requiring humans to author a 1-liner in CHANGELOG is not a big deal, but yields the best outcome: readable changes that are attached to every PR and that exist at all times during development. Everything else I think is optional/nice to have/unnecessary.

Rishikesh1159 commented 2 years ago

Just wanted to document how I generated 2.1.0 release notes, so that this process might help in automation or for next release. Steps:

* `git checkout origin/2.1`
* `git log --pretty=format:"* %s" --since=3-23-2022 > release-notes/opensearch.release-notes-2.1.0.md`
* Manually organize file sections
  * Add Date / Version section (copy + modify from previous release)
  * Enhancements/Bug fixes/Maintenance
* Replace pull request ids with full urls via regex
  * Find: `\(#(\d+)\)$`
  * Replace: `([#$1]([https://github.com/opensearch-project/opensearch/pull/$1))`](https://github.com/opensearch-project/opensearch/pull/$1))%60)
* Commit with these steps / modified with additional details

-> In second line of above command block, when using --since the date should be selected carefully. You should select the date when opensearch version bumped from MajorVersion.x to new version, example (2.x version bump to 2.1)

-> I had to manually go through each commit and sort it out to one of the sections mentioned here.

-> Next I had to check which commits are already present in previous release version and remove them from current release notes manually. This step can be done before sorting out each commit to appropriate section.

->Make sure no breaking changes are included if it is a minor release versions.

kotwanikunal commented 2 years ago

@dblock I have worked on adding the workflow as a part of this PR - #4085 The solution as it is currently implemented in the PR will not solve for backported PRs which will need additional work. (Maybe something on the lines of https://github.com/dangoslen/dependabot-changelog-helper) This however does solve for dependabot changes automatically and enforcing PRs for a manual changelog. Before we move ahead with the additional effort for backporting, I wanted to open this up for further discussion.

kotwanikunal commented 2 years ago

4085 - change has been merged in and backported to 2.x as well.

Keeping the issue open to address any gaps that come up during the transition.

kotwanikunal commented 2 years ago

The changes have started showing results for the major unreleased branch and dependabot PRs. There are still some gaps around backporting that I have noticed and also called out as a part of #4085 -

A possible solution for the above problem is -

Also, there is another discussion around missing process for feature branches here - #4477

@dblock

dblock commented 2 years ago

Thanks, I think fixing backport to properly merge changelog is the way to go.

kotwanikunal commented 2 years ago

Raised a PR to add support for skipping possible conflicting files while backporting changes: https://github.com/VachaShah/backport/pull/4

kotwanikunal commented 2 years ago

Another update: I was able to fork https://github.com/dangoslen/dependabot-changelog-helper and add in additional features to backport the actual changelog line.

It works exactly as I had described above -

Git: https://github.com/kotwanikunal/changelog-helper

I am waiting for the merge and release on https://github.com/VachaShah/backport/pull/4. We can then proceed with the calls for changelog helper.

dblock commented 2 years ago

@kotwanikunal is there a feature here you can try upstreaming into https://github.com/dangoslen/dependabot-changelog-helper? certainly worth your time I think in order to avoid maintaining a fork

kotwanikunal commented 2 years ago

@kotwanikunal is there a feature here you can try upstreaming into https://github.com/dangoslen/dependabot-changelog-helper? certainly worth your time I think in order to avoid maintaining a fork

The current solution on my fork might be too specific to upstream. I'll try generalizing the approach and upstreaming it.

peternied commented 2 years ago

IMO files_to_skip is a great feature for the upstream version. At the very least by creating a PR it create the opportunity for the maintainer to make a decision.

kotwanikunal commented 2 years ago

IMO files_to_skip is a great feature for the upstream version. At the very least by creating a PR it create the opportunity for the maintainer to make a decision.

Thanks. That's the change that I have created for the backport GHA and I agree, that's upstreamable as is. I am just waiting to test it out.

I was discussing about an additional one that I have forked (linked here) to enable line based cherry-picking of changelog using diffs from the original PR.

rursprung commented 2 years ago

i just noticed that the CHANGELOG.md of OpenSearch on main contains 3x the Dependencies heading for the Unreleased block. has this something to do with the automated dependabot functionality (i.e. a bug there) or has this just been missed by chance?

see here: https://github.com/opensearch-project/OpenSearch/blob/77cff55bd431c0e2f8fdea5a40c83f8e8bddeca1/CHANGELOG.md?plain=1#L6 https://github.com/opensearch-project/OpenSearch/blob/77cff55bd431c0e2f8fdea5a40c83f8e8bddeca1/CHANGELOG.md?plain=1#L36 https://github.com/opensearch-project/OpenSearch/blob/77cff55bd431c0e2f8fdea5a40c83f8e8bddeca1/CHANGELOG.md?plain=1#L44

rursprung commented 2 years ago

something else i just noticed after contributing the first time since this rule has been established: it's rather impractical to add a reference to the PR in the changelog as that means that you first have to push your commit(s), create a (draft) PR, then update the changelog, commit & push that again (if you do it in the unclean way you just have an "update changelog" commit added, otherwise you need to briefly do some git-rebase voodoo to have a nice history). this adds an IMHO unnecessary step.

i think this could also be left away as the CHANGELOG.md is usually viewed somewhere where you can also run git-blame on it (GitHub's web-UI shows this; so does any modern IDE and on the console that of course works as well) and through that you can find the rest of the commit and from there the PR (when on GitHub).

WDYT?

rursprung commented 2 years ago

and one more thing: i just created https://github.com/opensearch-project/OpenSearch/pull/4723 which has no user-facing impact (purely internal to the build system, even w/o any impact there). IMHO this shouldn't be listed in the CHANGELOG at all (users don't care about this and reading about it just costs them time and confuses them until they find out that it has no impact for them). however, the current verification mechanism forces me to nevertheless list it there.

i'd suggest to relax this a bit (e.g. the ruling could be "all things relevant to users are documented in the CHANGELOG.md" and a reviewer can then always overrule the automated check?)

note: i write "user" but don't mean an end-user (who's calling the REST API or using OpenSearch Dashboards) but rather an operator of the software. maybe "operator" would be the better word? for commercial software i'd just write "customer", but obviously that wouldn't work here :)

dblock commented 2 years ago

something else i just noticed after contributing the first time since this rule has been established: it's rather impractical to add a reference to the PR in the changelog as that means that you first have to push your commit(s), create a (draft) PR, then update the changelog, commit & push that again (if you do it in the unclean way you just have an "update changelog" commit added, otherwise you need to briefly do some git-rebase voodoo to have a nice history). this adds an IMHO unnecessary step.

I generally amend my commits and force push. It's a small price to pay to help users locate the change by clicking on it IMO.

i think this could also be left away as the CHANGELOG.md is usually viewed somewhere where you can also run git-blame on it (GitHub's web-UI shows this; so does any modern IDE and on the console that of course works as well) and through that you can find the rest of the commit and from there the PR (when on GitHub).

Git blame is very technical and lacks a humanly readable explanation that is always in the body of the PR. So personally I am not for any changes from what we're trying to do now, even if it adds a tiny bit of work for every developer.

kotwanikunal commented 2 years ago

i just noticed that the CHANGELOG.md of OpenSearch on main contains 3x the Dependencies heading for the Unreleased block. has this something to do with the automated dependabot functionality (i.e. a bug there) or has this just been missed by chance?

see here:

https://github.com/opensearch-project/OpenSearch/blob/77cff55bd431c0e2f8fdea5a40c83f8e8bddeca1/CHANGELOG.md?plain=1#L6

https://github.com/opensearch-project/OpenSearch/blob/77cff55bd431c0e2f8fdea5a40c83f8e8bddeca1/CHANGELOG.md?plain=1#L36

https://github.com/opensearch-project/OpenSearch/blob/77cff55bd431c0e2f8fdea5a40c83f8e8bddeca1/CHANGELOG.md?plain=1#L44

This seemed to be an issue because we were using spaces along with heading within CHANGELOG.md which was creating new entries for the unreleased version. I have fixed that with #4740

dblock commented 2 years ago

FYI, see https://github.com/opensearch-project/OpenSearch/issues/4769 for an ask to revert all this. Please add your comments. My summary is:

The release notes had several fatal flaws: 1) users/contributors didn't know what changed half way during development, 2) the data was wrong too often, 3) having to gather release notes in the end delayed the release by at least a day. Automation of release notes on every commit (asked in #4769) can solve some of (3) and possibly (1) if you do it on all branches. But I think it will be garbage data because our PRs/commits are filled with backports and other information that's very hard to read for users.

The CHANGELOG approach has flaws too: 1) developers have to make changes in every PR and push twice/amend 2) there are merge conflicts on backports.

Having proposed CHANGELOG in this issue, I think 1) is a small amount of work and a developer can be expected to have to describe their change in a reasonable 1-liner. And 2) is being worked on in backport automation.

seraphjiang commented 2 years ago

1) users/contributors didn't know what changed half way during development

could we use release issue to track to avoid merge issue?

2) the data was wrong too often

we don't have to collect all commit, we could define the rule for commit message/label

3) other information that's very hard to read for users.

again, we should improve our commit history, insist highest bar on it. not ignore it

4) 1) is a small amount of work and a developer can be expected to have to describe their change in a reasonable 1-liner.

I disagree

when many develop work on same project, they have to rebase multiple times if they didn't get their PR merged after they got Approval. It will cause developer to redo 1) rebase fix change log 2) run tests, 3) collect approval again.

2) is being worked on in backport automation.

I disagree what I heard and observe from OSD cases, 90% backport case has conflict. backport automation will fail, need manual merge

my asking is could we invest on automation release note other than introduce this manual process? I'm also not asking for investing on automation changelog

could we use meta release issue to track change, that way we don't introduce a global lock file changelog before automation is ready

kotwanikunal commented 2 years ago

Reading through #4769 and this, here are my thoughts and what is currently baking -

  1. On PR conflicts:

    • Agreeably, there have been conflicts for CHANGELOG.md for PRs since we started with this effort
    • Within the OpenSearch repository, do not need re-approvals in case of new commits on a PR, hence the effort is not as extensive with fixing the changelog
  2. On backporting:

could we invest on automation release note other than introduce this manual process?

I would love to have some alternative to achieve this and can put in the effort if you have suggestions around such a framework. I agree with @dblock that, as a release manager, collecting notes is painful and developer/contributor notes are the best approach to a human-readable changelog. I am also all ears on alternatives that folks have to suggest.

seraphjiang commented 2 years ago

I propose using change log issue to replace existing change log file approach.

Keep the same format as changelog file.

When PR got merged, maintainer and author will decide if it is critical change need to add to change log.

When backport to release branch, it should update item in changelog issue to right version section

With this approach we still has place to take release note but without worry merge conflict hell anymore.

Any concern on this approach

anasalkouz commented 2 years ago

Here are my thoughts regarding the changelog file: I do agree to keep the changelog file approach and try to avoid any human interaction during the process and I don't expect merging a conflict in the changelog file should take more than few mins to resolve. That being said, I think we need to invest more time to make it more easier for developers and remove an churn by 1) doing some automation improvements like what @kotwanikunal suggested to reduce number of conflicts. and 2) Make merging changes to the changelog optional and it's up to the author and reviewers to decide, this will help incase we have PRs with feature to announce like code refactor adding unit test ...etc.

andrross commented 2 years ago

I think the current process of requiring every commit to produce a line in the changelog is a big part of the problem here. We are essentially just duplicating the one-liner header from the commit message into the changelog, filed under the appropriate category. If this is the desired goal, then I think it can be achieved by requiring discipline around the git commit message along with git tooling (this is how the linux kernel does it, for example).

@seraphjiang proposed solution suggests this is not the goal, as he states "When PR got merged, maintainer and author will decide if it is critical change need to add to change log." Getting clarity on this should help us find a solution. @dblock What is the ideal solution here? Should there be a 1-to-1 relationship between git commits and changelog entries, or should this be a manually curated list of changes?

Bukhtawar commented 2 years ago

I think its also something that we can templatise or follow best practises https://github.com/torvalds/subsurface-for-dirk/blob/a48494d2fbed58c751e9b7e8fbff88582f9b2d02/README#L88 while writing a commit message. The maintainer can then use his/her discretion to choose what qualifies for a change log and appropriately ensure it gets published

seraphjiang commented 2 years ago

Thanks @anasalkouz @andrross and @Bukhtawar @dblock for discussion

It is true, change log conflict take mins to resolve, however it then it introduce new commit, and need to re-run CI which takes hours to finish. If this happen again and again when multiple people is committing, developer will be very frustrated.

Think about long term productivity when more and more contribute join the effort (they don't even has permission to re-run failed flaky test). The changelog file process increases the their time and cost to contribute which leads to frustration.

Because majority of contributors don't has permission to backlog, maintainer will have to resolve merge conflict for the backport change(which is not related to actual code itself)

Thinking out of box, the goal is to address release note issue, changelog if just one option. We knew we need 1) automation and 2) best practices and discipline, template to enforce good commit message, no matter we has changelog or not.

I'll try an experiment for release note automation and share what I learn from it.

seraphjiang commented 2 years ago

I have tested release note automation

it is pretty neat, see my experiment below

https://github.com/seraphjiang/dashboards-anywhere/releases/tag/v0.9.1

image