jorisroovers / gitlint

Linting for your git commit messages
http://jorisroovers.github.io/gitlint
MIT License
826 stars 100 forks source link

Insufficient functionality for ignoring length of specific body lines #255

Open scop opened 2 years ago

scop commented 2 years ago

In a nutshell, my use case is avoiding triggering line too long errors for certain lines in the commit body. I can accomplish that, but not in a way that it would not cause other problems. Some simplified examples follow, using lines starting with https:// as the example one I'd like to exempt from line length checks.

Try 1: ignore-body-lines

$ cat test-msg 
Title goes here

More info:
https://github.com/jorisroovers/gitlint/commit/3c017995633f602125f7caaa91e96bb767ca5953
$ cat .gitlint 
[ignore-body-lines]
regex = ^https://
$ cat test-msg | gitlint
3: B5 Body message is too short (10<20): "More info:"

This is an instance of the problem documented with ignore-body-lines (it mentions line number confusion only though which is much less problematic than resulting in the B5 false positive IMHO). Anyway, for this reason, ignore-body-lines does not fit the bill.

Try 2: ignore-body

$ cat test-msg 
Title goes here

More info:
https://github.com/jorisroovers/gitlint/commit/3c017995633f602125f7caaa91e96bb767ca5953

This is a very very very very very very very very very very very very very very very very long line whose line I do _not_ want to ignore.
$ cat .gitlint
[ignore-by-body]
regex = ^https://
ignore = body-max-line-length,body-min-length
$ cat test-msg | gitlint
# (no output)

To overcome the problem with ignore-body-lines line matches being entirely ignored and causing at least the B5 false positive, here's an approach using ignore-by-body with body-min-length in ignore. This sidesteps that problem, but causes another one: body-max-line-length must be in ignore in order for the line I want to be exempt for that check, but it causes line length checks to be skipped for all lines in the body. I'd like the "This is a very very ..." line to be flagged as too long (but I can see why it is currently not).

Did I miss something?

I'm wondering if I missed something that could be used to accomplish what I'm looking for without side effects.

Appendix

Aside, perhaps body lines without whitespace should be exempt for the line length checks by default. This would not help with my problem completely as it has a few more variables, but would make sense I think.

scop commented 2 years ago

(The commit URL used as an example above does not have anything to do with the actual problem, it's just a random one I grabbed from this project for illustration purposes.)

sigmavirus24 commented 2 years ago

I think a better example of what you're trying to show is:

> cat test-msg 
Title goes here

More info:
https://github.com/jorisroovers/gitlint/commit/3c017995633f602125f7caaa91e96bb767ca5953

This is a very very very very very very very very very very very very very very very very long line whose line I do _not_ want to ignore.

> cat .gitlint
[ignore-by-body]
regex = ^https://
ignore = body-min-length

> cat test-msg | gitlint  
4: B1 Line exceeds max length (87>80): "https://github.com/jorisroovers/gitlint/commit/3c017995633f602125f7caaa91e96bb767ca5953"
6: B1 Line exceeds max length (137>80): "This is a very very very very very very very very very very very very very very very very long line whose line I do _not_ want to ignore."

Because you're not explicitly ignoring that last line with ignore=body-max-length.

I suspect the idealized config that would work here would be:

> cat .gitlint
[body-max-line-length]
line-length=120
ignore-regex=^https?://

That said, I did find that this will give you what you want:

[ignore-body-lines]
regex = ^https?://

[ignore-by-body]
regex = ^More info:$
ignore = body-min-length
scop commented 2 years ago

I think a better example of what you're trying to show is: [...]

Maybe, maybe not :) My example illustrates the problem of all long lines being ignored, yours illustrates none of them being ignored, with slightly different configs. But we are talking about the same thing, that's what's important here.

That said, I did find that this will give you what you want:

Sure, for that specific, simplified example. (It would fail to flag a commit whose entire body is More info: (without an URL etc) as invalid though.) But perhaps I should just give out the complete case, through the gitlint config in place, it's not that much more complex:

[general]
ignore = body-is-missing
ignore-fixup-commits = false
ignore-revert-commits = false
ignore-squash-commits = false
ignore-stdin = true
contrib = contrib-title-conventional-commits

In addition to the above base config, I'd like to ignore line lengths for lines starting with:

...without causing side effects hitting or missing other rules for other lines (I'm fine with ignoring everything on those long lines, if ignoring only the length isn't available). So in regex terms:

^(Co-authored-by:|((Refs|See) )?https?://\S+$)

Enumerating what are the accepted too short bodies to be ignored (like "More info:") as those special cases is not possible/feasible.

jorisroovers commented 2 years ago

Ok, let me see if I understand this correctly.

You’d like to have something like this:

# For matching lines, ignore ONLY the specified rules, but apply everything else
[ignore-body-lines]
regex=^Co-Authored-By
ignore=body-max-line-length

# When not specifiying the 'ignore' option, we're really just saying ignore all rules
[ignore-body-lines]
regex=^Co-Authored-By
ignore=all

# This has the same behavior as setting ignore=all
[ignore-body-lines]
regex=^Co-Authored-By

I can see this would be useful - it would require code changes to change how gitlint applies and implements the ignore rules though. @scop Please confirm or clarify :-)


Proposed implementation (gitlint internals)

Under-the-hood, the ignore-body-lines rule is really a ConfigurationRule. https://github.com/jorisroovers/gitlint/blob/85b817aadef01540478666d86e1a81a56b967ef8/gitlint-core/gitlint/rules.py#L399-L401

ConfigurationRules are applied once per commit, before all other rules:

https://github.com/jorisroovers/gitlint/blob/85b817aadef01540478666d86e1a81a56b967ef8/gitlint-core/gitlint/lint.py#L74-L76

I think to support this, we'd need to change ConfigurationRules to have a new target attribute, being either Commit or Line (with Commit being default for backward compatibility).

We’d then apply the ConfigurationRules with target Commit as we do today, but then additionally apply all ConfigurationRules with target Line on a per line basis.

So this code, would need to change a bit:

https://github.com/jorisroovers/gitlint/blob/85b817aadef01540478666d86e1a81a56b967ef8/gitlint-core/gitlint/lint.py#L74-L76

Like so:

# We use a new `self.commit_configuration_rules` property that filters all configuration rules with target==Commit
for rule in self.commit_configuration_rules:
    rule.apply(self.config, commit)

And then lower down, right before this code

https://github.com/jorisroovers/gitlint/blob/85b817aadef01540478666d86e1a81a56b967ef8/gitlint-core/gitlint/lint.py#L85-L89

We’d add something like this:

# We use a new `self.line_configuration_rules` property that filters all configuration rules with target==Line
self._apply_line_rules(commit.message.body, commit, self.line_configuration_rules, 1))

In addition, ignore-body-lines would then need to be converted to a ConfigurationRule with target set to Line and support for the new ignore rule option.

I think this could work, and might even allow us to fix the line numbering discrepancy issue. I'd need to implement a prototype to make sure.

scop commented 2 years ago

I think the suggested feature would work great for my purposes. Thanks for considering and working on it!

jorisroovers commented 2 years ago

So I put together a quick-and-dirty prototype that seems to work.

@scop I’d appreciate if you can give this a try and see if this works for you:

#  git checkout instructions
git clone https://github.com/jorisroovers/gitlint.git
git checkout issues/255
python -m venv .venv
. .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt -r test-requirements.txt -r doc-requirements.txt
cd gitlint-core
python setup.py develop
cd ..
python setup.py develop
# gitlint should work at this point
gitlint --version

# test commit message and gitlint file
echo -e "Commit Title\n\nFoo\nThis \t line contains violations and it's also really long but that's on purpose  \nBar\n" > /tmp/test.txt
echo -e "[ignore-body-lines]\nregex=^This\nignore=body-trailing-whitespace,body-hard-tab\n" > /tmp/gitlint

# Testing the feature:
cat /tmp/test.txt  | gitlint -C /tmp/gitlint
# Output: notice how the rules B2 (body-trailing-whitespace) and B3 (body-hard-tab) are ignored
# But the rule B1 is still applied
# The line number is also fixed
4: B1 Line exceeds max length (81>80): "This     line contains violations and it's also really long but that's on purpose  "

Notes

Notgnoshi commented 2 years ago

An alternative approach I've used for this problem is to define a custom rule

import re
from gitlint.rules import BodyMaxLineLength

class BodyMaxLineLengthWithExceptions(BodyMaxLineLength):
    name = "body-max-line-length-with-exceptions"
    id = "UC1"

    def validate(self, line, commit):
        if line.startswith(" " * 4):
            return None

        ret = re.match(r"^\[\d+\] ", line)
        if ret is not None:
            return None

        return super().validate(line, commit)

The intent of the leading space exception was for block quotes, which are normally shell session, code snippets, or log snippets.

The intent of the footnote-style lines was for URLs, which is what I expect prompted this issue. The reason we chose footnotes was that you could put them inline with your commit message without impacting readability.

The downside of this approach is that, while it works great in CI pipelines, it forces you to disseminate both the custom rules, and the config that disables the default body-max-line-length for local use by developers, which is what prompted #188.

scop commented 2 years ago

Sorry for taking ages to get back to this. I tried out the issues/255 branch briefly, and did not manage to find any problems with it :+1:

But it just occurred to me that a general purpose "ignore output matching a given regex" would work for this purpose just fine, and I guess (100% unverified) it could be less intrusive to implement. It wouldn't be as elegant as more targeted approaches, but would serve as a brutally efficient last resort -- not only for this particular one, but for any message one wants to ignore for whatever reason.

So a given regex in this rule, say [ignore-output-lines] would be matched against the entire output of gitlint, and filtered out + no effect on the exit status. So for example, this config:

[ignore-output-lines]
regex = ^\d+:\s+B1\s[^:]:[\s"]*https?://.*

...could be used to ignore this:

3: B1 Line exceeds max length (125>80): "https://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

A downside would be dependency on gitlint's output formatting, but I wouldn't personally mind that, as long as it's stable.