htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.72k stars 419 forks source link

Feature: zero return status code when all errors & warnings have been muted #933

Open Lucas-C opened 3 years ago

Lucas-C commented 3 years ago

Hi!

Thank you for maintaining this robust HTML linter!

I'm using bootstrap-progressbar on an old project, that uses custom HTML attributes like aria-valuetransitiongoal.

There is a minimal HTML usage example (without the JS libs):

<!DOCTYPE html>
<html lang="fr">
<head>
  <meta name="generator" content="HTML Tidy for HTML5 for Windows version 5.6.0">
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Issue reproduction for HTML Tidy</title>
</head>
<body>
  <div class="progress">
    <div class="progress-bar progress-bar-primary six-sec-ease-in-out" role="progressbar" aria-valuetransitiongoal="80">
      80%
    </div>
  </div>
</body>
</html>

Calling HTML Tidy on this file:

$ tidy -quiet -lang en -indent -wrap 0 -modify index.html
line 13 column 5 - Warning: <div> proprietary attribute "aria-valuetransitiongoal"

Could it be possible to add an option in htmltidy.conf to ignore this kind of situation please? I' would like the tidy CLI to exit with status 0 while validating this file.

ler762 commented 3 years ago

On 3/31/21, Lucas Cimon wrote:

Calling HTML Tidy on this file:


$ tidy -quiet -lang en -indent -wrap 0 -modify index.html
line 13 column 5 - Warning: <div> proprietary attribute
"aria-valuetransitiongoal"

Could it be possible to add an option in
[htmltidy.conf](http://api.html-tidy.org/tidy/tidylib_api_5.4.0/tidy_quickref.html)
to ignore this kind of situation please?

The ability to suppress messages already exists. --mute-id yes to see the id and then --mute "id" to suppress the message

I' would like the tidy CLI to exit with status 0 while validating this file.

I think that was an earlier feature request - if a message is suppressed don't set the error status if that error/warning is found.

I don't remember it going anywhere tho :(

$ tidy -q -indent -w 120 --mute "PROPRIETARY_ATTRIBUTE" x.html <!DOCTYPE html>

Issue reproduction for HTML Tidy
80%

$ echo $? 1 Even though no warnings were displayed you still get a non-zero exit code :(

Regards, Lee

Lucas-C commented 3 years ago

Thank you @ler762 ! Are you refering to https://github.com/htacg/tidy-html5/issues/752 ?

ler762 commented 3 years ago

On 3/31/21, Lucas Cimon wrote:

Thank you @ler762 ! Are you refering to https://github.com/htacg/tidy-html5/issues/752 ?

https://github.com/htacg/tidy-html5/issues/794

where the resolution was

In the circumstances where tidy is being used in a toolchain, I would suggest that it is not as simple as testing if the exit code is non-zero... If you have no interest in warnings then you need to test if the exit is not 2, or negative... ie 0 or 1 is ok... continue processing...

I don't use tidy in a toolchain, so it's an N/A for me.. but I can see where it would be nice if the exit code was zero for no warnings (including the case where all warning msgs were suppressed) and set if there were warnings displayed.

Regards, Lee

Lucas-C commented 3 years ago

I renamed this issue and opened PR https://github.com/htacg/tidy-html5/pull/934 to implement this.

Also, I'd be willing to add a GitHub Actions pipeline to this project:

geoffmcl commented 3 years ago

@Lucas-C, thank you for the issue, and the PR #934 to implement it... and @ler762, thanks for the comments, and links...

At first glance it seems a reasonable Feature Request... but reading back in the previous posts, I still have trouble with the idea that just muting the displayed warning message, and especially error msgs, should also alter the final exit code... still about 50:50 on this ;=/

Although there may be a precedent, of sorts - running tidy with --warn-proprietary-attributes no on this sample, exits 0, I think... and there may be others... still to explore... especially with drop-proprietary-attributes, strict-tags-attributes, and others... feedback welcome...

Note you have already done the regression tests... thanks... I too, cloned your fork, switched to issue-933 branch, built, and tested my tidy-5.7.45.I933.exe, in Windows, and nothing popped... but there is only one test using config mute: FAKE_TAG, case 629, but this is more about an error in the config... I guess mute could be added to some existing tests, or maybe consider an additional test case, where the exit value is indeed changed... or not...

I also wonder about issue #921, PR #931... some changes in the exit code, in certain conditions... but maybe this does not change anything here... appreciate feedback, checking, comments,... thanks...

As commented in #794, the present docs uses displaying... so as part of this PR, it would also be better to enhance the mute , and the mute-id docs, to reflect this change in exit code... if eventually agreed...

Likewise, maybe the warn-proprietary-attributes doc should be considered... and I am sure there are other documentation issues... especially with the simple concept - A 0 exit means good to go, 1 if there are warnings, 2 if there are errors, and -1 for fatal errors... this gets very fragmented, the more exceptions there are... just thoughts...

Not sure what Github Actions you are proposing... but we have resisted some CI in the past... although we do use it in some other htacg repos... but for sure this should be separate issue... certainly NOT a PR at this stage - needs to be discussed, pros, cons, etc... thanks...

So there seems a few points to clear up. Look forward to further feedback, comments, ideas, etc... thanks...

Lucas-C commented 3 years ago

Thank you for the detailed feedback @geoffmcl. My answers point by point below:

I still have trouble with the idea that just muting the displayed warning message, and especially error msgs, should also alter the final exit code... still about 50:50 on this ;=/

In my experiences with linters, this is a very common behaviour: csslint, eslint, hadolint pylint... They all provide an "ignore rule" mechanism, that silence the rule(s) specified, and set the final error code to zero if there are no errors detected, or only ones matching the silenced rules.

Although there may be a precedent, of sorts - running tidy with --warn-proprietary-attributes no on this sample, exits 0, I think... and there may be others... still to explore... especially with drop-proprietary-attributes, strict-tags-attributes, and others... feedback welcome...

Thank you for bringing warn-proprietary-attributes up! I missed that option, and it definitively solves the issue I initially had. My feedback on this : tidy-html would really benefit from a generic, uniform, mute: $RULE_ID mechanism. IMHO it is a very common system, and it makes it conceptually easier for users to silence some rules, knowing there is a generic system to do so instead of having to search for a dedicated CLI argument / config option that may or may not exist.

Outside the scope of this PR, my take on this would be to deprecate dedicated options like warn-proprietary-attributes in favour of a the mute $RULEID system. Just my humble opinion, with a focus on usability for users.

Note you have already done the regression tests... thanks... I too, cloned your fork, switched to issue-933 branch, built, and tested my tidy-5.7.45.I933.exe, in Windows, and nothing popped... but there is only one test using config mute: FAKE_TAG, case 629, but this is more about an error in the config... I guess mute could be added to some existing tests, or maybe consider an additional test case, where the exit value is indeed changed... or not...

I'm totally willing to add tests using mute to tidy-html5-tests, including tests covering this feature. I guess they should be added after merging this though, due to them living in a separate repository?

I also wonder about issue #921, PR #931... some changes in the exit code, in certain conditions... but maybe this does not change anything here... appreciate feedback, checking, comments,... thanks...

Similarly, after reading through them, IMHO they do not seem directly related to this matter.

as part of this PR, it would also be better to enhance the mute , and the mute-id docs, to reflect this change in exit code... if eventually agreed...

I totally agree. I'll add a commit to this PR to update the docs once this change is agreed on.

Likewise, maybe the warn-proprietary-attributes doc should be considered... and I am sure there are other documentation issues... especially with the simple concept - A 0 exit means good to go, 1 if there are warnings, 2 if there are errors, and -1 for fatal errors... this gets very fragmented, the more exceptions there are... just thoughts...

Agreed. Maybe the best thing to do would be to expand on the existing doc at Running Tidy > Running Tidy in Scripts detailing exit codes with a mention of the "mute" logic.

Regarding warn-proprietary-attributes, I'm going to wait for your answer on the idea to deprecate it in favour of mute: PROPRIETARY_ATTRIBUTE before considering how to update its doc section.

Not sure what Github Actions you are proposing... but we have resisted some CI in the past...

OK. I'll leave that outside this PR. I'm curious to know why resisting automated tests pipelines though: is there an issue somewhere related to the subject, where I could find some background context?

As a side note: I definitively had issues accessing the documentation at https://api.html-tidy.org/tidy/tidylib_api_next/ Either the website was really slow to answer or it even sometimes failed to display the page entirely.

Lucas-C commented 3 years ago

One more question : should I update a CHANGELOG.md or some other kind of "release notes" file as part of this PR?

geoffmcl commented 3 years ago

@Lucas-C, thank you for the further feedback...

I will try to answer each of your perceived points, but may miss some... such in-line commenting is difficult... but...

Re: What others do!

While it may be a valid point, listing what others do, it does not carry that much weight. If I searched around I could probably find others that don't offer a generic turn-it-off switch... and do they have the same purpose as tidy? so this is probably 50:50...

And on checking some you mentioned, there seem some difference between the ideas you can pick and choose - csslint, customize - eslint, best practices - pylint, etc, VERSUS W3C approved for tidy, where it is not up to the user to choose!

Would you want to added ignore switches to the W3C validator? - it does have a filter - maybe 49:51 against...

How can you indicate valid - i.e. exit with 0 - when it is clearly not valid? - sample will also not pass the W3C validator!

And maybe it comes down to examining the specific cases. While it might be ok if the warning is just a minor case - of course difficult to categorize into minor versus major - and so many cases in-between... but to request tidy ignore some clear errors, exit 0, seems a step in the wrong direction...

Of course, if it is a case that the particular warning/error has been subsequently approved, and tidy has to catch up, then this is a bug that should be fixed, not suppressed. And maybe your aria-valuetransitiongoal attribute is such a case? Tidy already supports some 35 aria-xxx cases... has this one been missed? But also presently fails W3C validation... W3C docs, recs, refs needed...

I thought I remember some general check for hyphenated attributes, but can't find it just now... but it does seem there could be a case for a generic IsAriaLike(attr) test, like perl name =~ /^aria-/;, and tidy accepts it... or something... but can be a separate, new issue... not here...

Re: generic vs dedicated, and depreciation

Only since tidy 5.6 have we had the generic mute, so I guess warn-proprietary-attributes could be marked as depreciated... or something...

As a start maybe the mute alternative could be mentioned in its docs - but that is not the point here - If you feel strongly that is so, then this could be a separate new issue, to be discussed. Seems minor to me, at present... but...

Re: regression test - as stated, add test, or not... not the point...

Re: improve docs - that is a forever, ongoing task - will always be true! ;=))

And the Running Tidy - script is a good case in point, in addition to mute-id/mute docs, which should be changed, as part of this - it is not a case of if approved, but more maybe not, without it...

Exactly how we are going to document this drastic change in the exit code, should be part of the approval case... simple, it can not be merged without docs... suggestions, samples, ideas, PR, welcome...

Re: CI - Not against it - just present a case of why/what/when/..., and how this helps tidy, you, me, users, et al... pros and cons... and it could be considered. Do not have time to research all earlier comments, but one of the first seems bottom of 69, and #269, with many other refs... and others... - but not the point here...

Re: side note - CI in others - I think I have heard that failed to load api site problem at least once before... but it has never failed for me, and I access it regularly, almost daily... would like to get to the bottom of this, if there is a site issue - please open an issue, maybe in the api repo, if it continues... not here...

Re: CHANGELOG.md - Tidy does not have a change log, per se - the release notes 5.6 are generated from the git log, massaged into HTML, by perl tools... so nothing to be done here, except add appropriate messages to your commits... Here is 5.7.45 that I gen'ed recently, just to check progress...

What about the --show-warnings no... what does that do? On the sample, seems the warning goes away, but the exit code remains 1... should this change? ... like maybe --warn-proprietary-attributes no... just exploring... questioning... understanding...

Agree 100% - focus on usability for users - that's what Grand-Daddy - Tidy is all about...

Hmmm, did I cover everything? Probably not, but ask again if I missed...

Be aware, this discussion, so far, has pushed me more towards the negative ;=))

IMHO: tidy should exit: 1 for all warnings, 2 for all errors, etc, whether they are shown/displayed/output, or squelched, to use the original title of the mute option... which goes to its aims...

Look forward to further feedback, comments, doc change suggestions, patches, PR, etc, etc, ... thanks...

geoffmcl commented 3 years ago

@Lucas-C, my continued exploration...

Re: depreciation of an option

Thought I found that we do have a depreciated list, to be hardcoded into config.h table... so trying it out, with -

diff --git a/src/config.c b/src/config.c
index bae56b8..bb0da4e 100644
--- a/src/config.c
+++ b/src/config.c
@@ -293,6 +293,7 @@ static const struct {
     ctmbstr name;                /**< name of the deprecated option */
     TidyOptionId replacementId;  /**< Id of the replacement option, or 0 if none. */
 } deprecatedOptions[] = {
+    { "warn-proprietary-attributes", TidyMuteReports },
     { NULL }
 };

BUT this is not really a depreciation table, but more for a removed option! It outputs Config: unknown option: warn-proprietary-attributes, skips the option, which will cause a cascading error on the following yes/no/.., even though it remains in the code, and does not mention the suggested alternative... UGH!

See the code... particularly note Bool status = ( option != NULL ) && !isDeprecated; if ( !status ), and what follows...

This seems clearly a case of a mis-leading table name... should be, say removedOptions[], abandonedOptions[], or something... but we already do have the unknown option message... what does this table add???

For a depreciated message, I would have expected a nag output something like -

Config: depreciated option: 'warn-proprietary-attributes'!
  Be warned, at some time this option will be removed entirely.
  Replace it with `mute: PROPRIETARY_ATTRIBUTE` instead.

Maybe could be added, changed... Oh, well...

Re: Change exit code

But for the moment, on this issue of mute changing the exit value, still remain on the negative side... sorry...

As indicated, the simple documentation that - 0 = good, 1 = warning(s), 2 = error(s), -1 = fatal - becomes almost impossible to qualify, if exceptions are added...

Like, yeah, but it is not 1, IIF you mute the warning, and seems to be very ugly if you mute an error! And in that case should tidy write an output?

And maybe even --warn-proprietary-attributes no should not be such an exception to this rule... will try to look at this...

One problem I have with mute, is that there is no way to un-mute something, once it has been added to the config, and this has problems when using multiple tidy <config> files <config> files ... on the command line... but is not serious enough for me to persue...

And there has been no documentation change suggested, for consideration...

Continuing investigation... look forward to further feedback, ideas, patches, PR, etc, etc... thanks...

Lucas-C commented 3 years ago

Hi @geoffmcl

Thank you for your detailed answers.

But for the moment, on this issue of mute changing the exit value, still remain on the negative side... sorry...

Alright. I am deeply convinced that this behaviour (returning a non-zero exit code when the user explicitely stated in htmltidy.conf that it wants to mute some rules) really hinders tidy to be used as a HTML linter in CI pipelines.

All CI pipelines engines that I have used (GitHub Actions, Gitlab CI, Travis CI, Circle CI...) use a fail-fast approach when executing scripts at each stage, similarly to bash -o errexit / -e, which makes CLI exit codes really important in those environments.

Currently, a user of tidy can silence error/warning messages but not alter the exit code of the CLI using the generic mute mechanism in htmltidy.conf. Hence this makes tidy usage in a CI pipeline problematic.

Agree 100% - focus on usability for users - that's what Grand-Daddy - Tidy is all about...

If you say so. I am not going to try to convince you further. This was an interesting exchange for me. Thank you for the time you took to answer my feature proposal.

As far as I'm concerned, this isssue and PR #934 can be closed.

balthisar commented 3 years ago

@Lucas-C , don't be discouraged. Sorry for the delay piping in.

My suggestion would simply be to add a new option that works exactly like mute, but called mute-and-ignore; mute is all about quieting output, and not specifically meant to control whether or not Tidy is giving the GO-NOGO to the document. Implementing mute-and-ignore, though, would very specifically mean ignore. (While it might be easier just to call the new option ignore, the UI becomes inconsistent because of mute-id.)

This also has the potential to allow us to deprecate some of the existing configuration switches. Personally, I prefer to have switches be specific configuration options, because this makes it simple to automate in Tidy GUI applications, and you don't have to hunt down the awkward configuration option id with mute-id.

As for the deprecation methods @geoffmcl was looking at, the header indicates:

/*****************************************************************************
 ** Deleted Options Configuration
 **
 ** Keep track of options that have been removed from Tidy, so that we can
 ** suggests a replacement. When a deleted option is used, client programs
 ** will have the opportunity to consume the option first via the callback,
 ** and if not handled by the callback, will be handled by Tidy, generally
 ** by setting an alternate or new option, in `subDeprecatedOption()`.
 ******************************************************************************/

The subDeprecatedOption() function isn't automatic, though. We'd have to go in there and make changes to accommodate the deprecated option. This very much in fact is a deprecated option mechanism, and not a "deleted option" mechanism.

In general, I would prefer to have a callback handle this deprecation, though. This would move the logic from the library to the consumer, e.g., the console application. In principle, the library should have a discrete set of options that affect behavior of the Tidy'ing process, with no duplicates and no redundancies, and output filtering should be handled by the client (e.g., console application). As a library, it still offers the convenience of parsing options and files, of course. This represents a fairly extensive re-write of the console application, however, and so we usually just keep piling stuff into the library.

As for CI, I've added automated testing and brought the regression tests back into this repo.

mcandre commented 1 year ago

Related problem:

Any warnings contribute to a non-zero exit code, even when --show-warnings false is supplied.

That flag results in a program console trace with an exit code that can prevent the rest of a well formulated set -e script from running. And the warnings that contribute to the bad exit code are elided from the console output.

Please ensure that --show-warnings false will disable warnings from contributing to the final exit code of tidy-html5.

placoderm commented 11 months ago

Just to add another example… I am preparing html files used within a web app. The requirement is that the files have empty title tags. They are filled in later with js I guess.

I'd love to be able to get a 0 exit code when I have muted the empty title tag warning.