orhun / git-cliff

A highly customizable Changelog Generator that follows Conventional Commit specifications ⛰️
https://git-cliff.org
Apache License 2.0
8.42k stars 174 forks source link

Multiline matches with CR (^M) characters #423

Open wookayin opened 6 months ago

wookayin commented 6 months ago

Describe the bug

I think only the title of git commit messages should be considered, but when the commit contains mixed CR/LF for some reason, extraction of commit title would be broken, resulting into multi-line messages (with CR) printed in the release note.

To reproduce

git clone https://github.com/neovim/neovim
git cliff -c c.toml 72a6643b1~1..ca5de93

c.toml:

[git]
conventional_commits = true
filter_unconventional = false
filter_commits = false
commit_parsers = [
    { message = "^.*", group = "Others" },
]

[changelog]
body = """
{% for group, commits in commits | group_by(attribute="group") %}
    ### {{ group | upper_first }}
    {% for commit in commits%}\
       - <<< {{ commit.id }} >>> {{ commit.message | upper_first }}
    {% endfor %}\
{% endfor %}\n
"""

Output:


### Others
- <<< 72a6643b1380cdf6f1153d70eeaffb90bdca30d6 >>> Docs #24061

- nvim requires rpc responses in reverse order. https://github.com/neovim/neovim/issues/19932
- NVIM_APPNAME: UIs normally should NOT set this.^M
^M
ref #23520^M
fix #24050^M
fix #23660^M
fix #23353^M
fix #23337^M
fix #22213^M
fix #19161^M
fix #18088^M
fix #20693
- <<< ca5de9306c00d07cce1daef1f0038c937098bc66 >>> Inlay hints #23984

A strange commit 72a6643b1380cdf6f1153d70eeaffb90bdca30d6 has a commit message where CR and LF is mixed.

Expected behavior

Only the first line is considered. Maybe we should normalize CR, LF into LF.

Screenshots / Logs

N/A

Software information

Additional context

This repro is a simplification of https://github.com/neovim/neovim/pull/26818 where git cliff --config scripts/cliff.toml v0.9.0..HEAD produces some strange multi-line release note items.

orhun commented 6 months ago

Hello, thanks for reporting this!

This is because the commit in question is not conventional and it is not filtered out. That is why git-cliff uses that commit in the changelog as-is.

To skip those commits:

filter_unconventional = true

Or if you only want the first line to appear in the changelog:

- <<< {{ commit.id }} >>> {{ commit.message | split(pat="\n") | first | upper_first | trim }}

You can also use commit preprocessors/postprocessors to process the commit/changelog.

When it comes to the actual question, I agree that we should normalize CR, LF into LF. That can be also done with pre-processors though. I'm not sure if git-cliff should manipulate the commit message internally in this case.

wookayin commented 6 months ago

Q: Is the regex ^foo matched against each of the lines or against the very first few characters only? How is the multiline string (or '\n') handled on regex matching? I thought it should be latter, but I don't still understand why this multiline string appears. Other commits also have a body that follow the title, but they won't appear. What makes the difference?

orhun commented 6 months ago

Q: Is the regex ^foo matched against each of the lines or against the very first few characters only?

It is matched for the first line since it is not configured as multi line. I also think that it should support multiline but I'm not sure how to achieve that with serde_regex:

    /// Regex for matching the commit message.
    #[serde(with = "serde_regex", default)]
    pub message:       Option<Regex>,

Feel free to open a tracking issue about this!

Other commits also have a body that follow the title, but they won't appear. What makes the difference?

I hope this answers your question 🐻