Feature request: add support for configurable warning/error threshold settings

squizlabs / PHP_CodeSniffer

PHP_CodeSniffer tokenizes PHP files and detects violations of a defined set of coding standards.

BSD 3-Clause "New" or "Revised" License

10.66k stars 1.48k forks source link

Feature request: add support for configurable warning/error threshold settings #2094

Open jrfnl opened 6 years ago

jrfnl commented 6 years ago

Use case:

When introducing PHPCS into the workflow of a large legacy project - hint: WordPress -, after the initial fixer run, there will often be non-auto-fixable errors and warnings left.

The dev team may not be in a position to address those straight away due to other priorities, but would want to prevent new errors/warning from being introduced.

To that end, introducing PHPCS into the build testing/CI process would be preferable, but as long as there are still errors/warnings to address, this would cause all builds to fail.

At this moment, you can use ignore_warnings_on_exit, ignore_errors_on_exit or the -n option or set up a really complicated ruleset which changes the severity of the remaining errors/warnings and then use --severity=#/ --error-severity=#/ --warning-severity=# to get the build to pass.

None of these options however will prevent new issues from being introduced as either new warnings/errors will not fail the build or new warnings/errors of a certain type will not fail the build.

Another option would be to run phpcbf as part of the build/CI process and fail the build if new fixes are being made, but again, that would not prevent new non-auto-fixable errors/warnings from being introduced.

Proposed new feature:

During a discussion earlier this week with @moorscode, the following was suggested as a possible solution:

We'd like to propose adding two new command-line options:

--warning-threshold=#
--error-threshold=#

These could also be set from within a custom ruleset using the <arg name="" value=""/> syntax.

The default thresholds would be 0.

When determining the exit code for a run, the threshold numbers provided could then be subtracted from the errors and warnings found during a run and the exit code determined based on the new values.

It would, of course, still be the responsibility of the dev team to update (lower) these numbers each time fixes are made, but at least this would allow a ruleset to be enforced.

@gsherwood How does this sound to you ? If you are open to the idea, I'd be happy to work on a patch for this.

/cc @pento

pento commented 6 years ago

Thanks for the ping, @jrfnl! This would be super useful for WordPress, as we work our way through the non-auto-fixable issues.

I also like the idea of the custom ruleset syntax, to make it easily version-controllable.

I think it would be useful to be able to break down the number of issues by file, so that a phpcs run just on changed files could be introduced to the existing precommit script: it would only take a few seconds to run (compared to a full run, which currently takes about 90 seconds on WordPress), and if the numbers reduce for a file, the script could automatically update the custom ruleset file. This would allow us to introduce automated maintenance of the custom ruleset file, without burdening contributors with extra steps before committing.

gsherwood commented 6 years ago

It would, of course, still be the responsibility of the dev team to update (lower) these numbers each time fixes are made, but at least this would allow a ruleset to be enforced.

I feel like this is the major issue with this feature. When documenting it, it's always good to give a reason why you'd ever use it in the first place. To say that it is a set of thresholds that you need to manually adjust seems a little strange. So I'm not really happy with the idea of hacking this in because I don't see how it is generically useful.

This feels like the kind of thing that should be checked in the build pipeline. Is it possible to check the return value of PHPCS and apply your own thresholds, and pass/fail the build accordingly?

jrfnl commented 6 years ago

Is it possible to check the return value of PHPCS and apply your own thresholds, and pass/fail the build accordingly?

I suppose that would be possible if it was just a feature request for one project, but I know of numerous other projects, aside from WP, which would also benefit from this feature.

It's sort of a choice between manually maintaining the threshold numbers or manually maintaining the CS for the whole codebase as it cannot be enforced. Manually maintaining the threshold numbers to me feels like the less manual labour intensive option, especially as that could possibly be automated for big projects.

moorscode commented 6 years ago

This functionality is something that is already implemented in other standards-tools, for example: https://eslint.org/docs/user-guide/command-line-interface#handling-warnings

We use this in many projects and has helped us to make sure no new code is accepted which introduces new warnings or errors.

moorscode commented 6 years ago

For any project that has not been started with PHPCS enabled, this feature would make it 1000x easier to start implementing fixes and problems.

This would enable a controllable flow for new contributors to help out clean up the code by just reducing the number of warnings or errors on an existing repository. Especially for projects where one person has started it and wasn't aware that PHPCS is something they could be using.

Making sure that no additional warnings are introduced is not a 100% guarantee that the code does improve, but you would need to solve a problem before introducing a new one.

Goals could also be easier to set, in 3 months we want to have reduced the number of warnings and/or errors to be down 50%/100 items, etc.

gsherwood commented 6 years ago

This functionality is something that is already implemented in other standards-tools, for example: https://eslint.org/docs/user-guide/command-line-interface#handling-warnings

I don't think that's a good example. That one is doing the opposite - producing an error exit status if it would have normally produced success.

If you have other examples it would actually be really helpful.

This would enable a controllable flow for new contributors to help out clean up the code by just reducing the number of warnings or errors on an existing repository. Especially for projects where one person has started it and wasn't aware that PHPCS is something they could be using.

This is my problem with the concept, and why I'm interested to see how other tools handle it (assuming they do). Is it okay that a developer manually fixes 10 errors in one file and commits 10 brand new errors in another? Your total error count is the same (no alerts) but is the code base improving?

Goals could also be easier to set, in 3 months we want to have reduced the number of warnings and/or errors to be down 50%/100 items, etc.

That's the sort of goal that I think is worth tracking. But you don't track that by pretending the run was a success. You let the run fail (maybe don't fail the build) and track that number. If it is trending down, great. If it gets low enough, maybe you can make a big push and turn build failures back on for PHPCS.

I'd prefer to see this working in real build systems for real projects before committing to including a feature into PHPCS that I have to maintain. What happens if the WP project decides that, after 6 months, manually maintaining these limits is not the way to do things?

pento commented 6 years ago

Psalm is a good example of a tool that allows suppressing errors. Vimeo built this feature specifically to help with fixing up an old code base: they could start applying rules to all new code, without being overwhelmed by exisiting issues that needed to be addressed. They recently went into more details on how this could be applied to WordPress.

Is it okay that a developer manually fixes 10 errors in one file and commits 10 brand new errors in another? Your total error count is the same (no alerts) but is the code base improving?

This is also where it would be helpful to be able to break the issue count down by file, rather than just using the total: a more fine-grained count reduces the likelihood of this occurring. As files reach 0 of any given issue type, it becomes impossible to introduce new problems, making it less of an overwhelming task to take on.

That's the sort of goal that I think is worth tracking. But you don't track that by pretending the run was a success. You let the run fail (maybe don't fail the build) and track that number.

I think it's possible to do both: a CI job that fails if the numbers increase, but include an allow_failures job that doesn't use the limits file. As part of the precommit process, we would automatically update the limits file with the reduced numbers for any changed files, to avoid subsequent commits being able to inadvertently introduce other issues.

I'd prefer to see this working in real build systems for real projects before committing to including a feature into PHPCS that I have to maintain. What happens if the WP project decides that, after 6 months, manually maintaining these limits is not the way to do things?

This is a feature that's really only useful for large existing codebases to be able to introduce enforced coding standards, particularly those that have an overwhelming number of issues that need manual intervention. In 6 months, however, I would hope that we would have eliminated our remaining ~4000 issues, and we wouldn't need to use this feature anymore. 🙂

I agree that it's a niche feature, I'd certainly appreciate if it were available, but I understand if it doesn't fit into your longterm goals for PHPCS.

If you're after a specific commitment, here's mine: my job is to work on WordPress full time, and I've been tasked with getting the WordPress codebase to 0 coding standards issues. This isn't a thing I can drop when I'm bored, it's what I'm being paid to do. 🙂

For building a "real build system" example, I'm not familiar with the PHPCS codebase, so I can't estimate the effort required to build a proof of concept PR to add the feature, but if @jrfnl has the time to build a working example, then I'm able to build the corresponding WordPress build system changes to work with it. Do you think that would be a reasonable example?

gsherwood commented 6 years ago

Psalm is a good example of a tool that allows suppressing errors.

PHPCS also has similar error suppression features. The Psalm ones you linked to are not like what is being described in this issue. I'm about to step into meetings so I haven't got time to look up the docs and see if there are others, but I'd be interested to know how they achieve "no new errors".

In 6 months, however, I would hope that we would have eliminated our remaining ~4000 issues, and we wouldn't need to use this feature anymore. 🙂

If you really think that, I'm not sure why I'd build something into PHPCS. If I accept this feature, I'm committing to support it long after you've forgotten about it.

jrfnl commented 6 years ago

@gsherwood

If it helps: I'll try and find the time to build a proof of concept with @pento over the next few weeks. The basic setting shouldn't be hard, but the modularization based on files is something I need to think about a little more.

If you really think that, I'm not sure why I'd build something into PHPCS. If I accept this feature, I'm committing to support it long after you've forgotten about it.

I don't think this is a feature which would only be used temporarily. Coding standards evolve.

While at this moment for WP it is a case of introducing enforcement, I'm also involved with several projects which have always (or at least for as long as I remember) enforced coding standards, but are running into trouble upgrading to the latest CS version.

Trying to bring the numbers down while enforcing the old version of the standards is like swimming against the tide as every time some manual fixes have been merged, some new (auto-fixable) issues have been introduced. Being able to enforce the new standards while allowing a limited set of errors and warnings (for specific files) would really help.

I imagine the same will apply to WP in the future.

I also wonder how other big projects like Joomla or Drupal handle this. @pfrenssen @photodude Care to give an opinion ?

gsherwood commented 6 years ago

Being able to enforce the new standards while allowing a limited set of errors and warnings (for specific files) would really help.

I'd love a feature to do that. I just don't see how using thresholds does it. I think it masks the fact that new errors are being introduced while other ones are being fixed.

What I don't know how to achieve in PHPCS directly is a way to allow existing errors to remain while new errors are banned. The only way I can think of doing that is to use a pre/post-commit or CI runner to check the old and new versions of a file and try to match up the errors and confirm no new ones. Or at least enforce that a file must not be committed with any more errors than it had before, so things are always improving or holding steady.

But this is not something PHPCS could do directly, which is why I'm questioning the benefit of having this in PHPCS at all. If feels like it belongs in a CI tool or something.

pfrenssen commented 6 years ago

Sorry for the late response.

For Drupal we are facing a similar struggle. We have a codebase that has evolved over almost 2 decades and we started an initiative a while ago to make our entire core code fully standards compliant.

We took a different approach, instead of gradually improving the standards and keeping track of the number of remaining violations, we are tackling each standard one by one and are maintaining an ever stricter ruleset. We found that this approach has some advantages:

Fixing one rule at a time allows the people involved to focus on one particular task, rather than having to consider the entire coding standards when looking at a piece of legacy code. For example if they are fixing whitespace around assignment operators, then this is the only thing being changed.
Fixing one thing at a time makes it much easier to review the resulting patches.
Once a certain standard has been fixed, our entire core codebase is now compliant with this particular rule. Since most new code that is being written starts out with copy/pasting a different class we managed to practically eliminate the problem of coding standards violations creeping into patches by copying non-compliant code.
We have a clear view on the work that remains to be done until we are fully compliant.
We found that these coding standards tickets are great for mentoring new developers at code sprints.

We also encountered a disadvantage with this approach: some rules can have a profound impact on the code base by the sheer number of lines that are being touched. Fixing all at once would result in a patch that is huge and very hard to commit since it takes a lot of time to review the patch, and in the meantime the code base doesn't stand still, and these patched need to be rerolled endlessly. We solved this by doing this work in smaller sections. Another solution is to wait for a point in time (such as the preparation of a release candidate) when the regular flow of commits to the project slows down.

This approach has some challenges but we found it to work well. We are steadily progressing on the compliance, and we have 80% of the work done by now. What really helps it that the rulesets are very flexible.

An example: we have a rule Drupal.Commenting.FunctionComment that checks if documentation for functions is correctly formatted. This rule is very complex and consists of a large number of individual sniffs. It is clearly impossible to fix all the documentation for all the functions at once, since this would be a mammoth task which would take months.

So we do this piece by piece, here is the relevant section from our ruleset, we have excluded all individual sniffs that still need to be done.

  <rule ref="Drupal.Commenting.FunctionComment">
    <exclude name="Drupal.Commenting.FunctionComment.IncorrectTypeHint"/>
    <exclude name="Drupal.Commenting.FunctionComment.InvalidNoReturn"/>
    <exclude name="Drupal.Commenting.FunctionComment.InvalidTypeHint"/>
    <exclude name="Drupal.Commenting.FunctionComment.Missing"/>
    <exclude name="Drupal.Commenting.FunctionComment.MissingParamComment"/>
    <exclude name="Drupal.Commenting.FunctionComment.MissingParamType"/>
    <exclude name="Drupal.Commenting.FunctionComment.MissingReturnComment"/>
    <exclude name="Drupal.Commenting.FunctionComment.MissingReturnType"/>
    <exclude name="Drupal.Commenting.FunctionComment.ParamCommentFullStop"/>
    <exclude name="Drupal.Commenting.FunctionComment.ParamMissingDefinition"/>
    <exclude name="Drupal.Commenting.FunctionComment.TypeHintMissing"/>
  </rule>

For each of these excluded sniffs we have an open ticket that can be fixed by the community.

vv12131415 commented 5 years ago

This is my problem with the concept, and why I'm interested to see how other tools handle it (assuming they do). Is it okay that a developer manually fixes 10 errors in one file and commits 10 brand new errors in another? Your total error count is the same (no alerts) but is the code base improving?

What can be also done is not only count errors but also (as @pento mention) count them on per file basis and also track error names so this way if you fix 10 errors and introduce 10 new errors and even if it was the same file, the erro name will be different and PHPCS should fail. With those 2 additions your probability of not getting alerts from PHPCS decreases.

alfredbez commented 4 years ago

Would like to see something like this as well.

We're doing it currently like this:

We have an error threshold in our .travis.yml
We created a wrapper for phpcs which checks if the number of errors is below the threshold

gsherwood commented 4 years ago

Related: #2543

I've left a comment over there with some thoughts: https://github.com/squizlabs/PHP_CodeSniffer/issues/2543#issuecomment-550021421