htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.71k stars 418 forks source link

--mute should suppress non-zero exit code #794

Closed jan-matejka closed 5 years ago

jan-matejka commented 5 years ago

I am running tidy as part of static web site build pipeline and as a workaround for an issue with syntax highlighter I need to ignore TRIM_EMPTY_ELEMENT warning.

Note this is the only warning I get. When I mute it, it does not show in the output anymore, but it still causes non-zero exit code which breaks my build.

I find this behavior quite unintuitive and think --mute should also mute changes to exit code.

geoffmcl commented 5 years ago

@yaccz thanks for the issue...

It is also good to give a minimum html example input, tidy version, config used, output, and expected output, just to make quick testing easier...

Here I built some of my own, 4 in_794(0-3).html files, and used current next, 5.7.20, but also the following I752-1* build...

My first sample -

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Is. #754</title>
</head>
<body>
<p>Drop empty element</p>
<p></p>
<p>--mute TRIM_EMPTY_ELEMENT should drop warning</p>
</body>
</html>

At first I thought this might be related to issue #752, PR #764, branch issue-752, build I752-1, but it seems not directly...

There are several precedents where an option avoids a warning message, and thus also does not bump the doc warning count...that is does not contribute to a non-zero exit...

Off the top of my head is say, a missing DOCTYPE! warning is avoided if you configure --show-body-only yes... and not only is the warning message dropped, but the tidyWarningCount( tdoc ), ie the internal doc->warnings, is not incremented...

But there are other examples of where the users input changes the doc->warnings count... still checking... examples appreciated... thanks...

So if the user configures --mute TRIM_EMPTY_ELEMENT, which says be quiet about dropping empty elements, don't want to hear about it... ok... maybe tidy should not bump the doc->warnings count?

At present, on sample 1, with I762-1, you get output like -

Info: messages of type "TRIM_EMPTY_ELEMENT" will not be output
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!

It says found 1, but showed none... and Tidy exit level is 1...

Although, in this case, there is a --drop-empty-element no option also... but maybe not important...

Seems mute/drop the warning is the users request...for all --mute XXXX options, ... equals no msg, no bump...

Look forward to comments, patches, PR to change this behavior, if all agreed this is a valid Feature Request...

Thinking, testing, trying... thanks...

jan-matejka commented 5 years ago

It is also good to give a minimum html example input, tidy version, config used, output, and expected output, just to make quick testing easier...

yeah, sorry about that. I felt like this case is simple enough to skip those.

It says found 1, but showed none... and Tidy exit level is 1... Seems mute/drop the warning is the users request...for all --mute XXXX options, ... equals no msg, no bump...

I am not familiar with the tidy internals but your analysis seems correct to me.

Though I'd classify this a Bug rather than Feature Request given the inconsistency of reported warning count vs printed warnings.

geoffmcl commented 5 years ago

@yaccz have started to look a little deeper... thanks for the feedback...

The classification is not important... if the maintainer, who introduced this mute option, intended this behavior, then is not yet a bug... and while I can not read minds, studying the internals is the only clue available... but let's not get hung up about semantics... so on with the issue...

Which is --mute should suppress non-zero exit code, at least not contribute to a non-zero exit...

Maybe my too quick choice of missing DOCTYPE! and --show-body-only yes as an example is not a good one... I did ask for others...

In further testing, find there are others cases, where the message would be suppressed, but the doc->warnings would remain incremented, giving a non-zero exit... so maybe this is the normal behavior... could say intuitive even...

A quick example -

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Is. #754-4</title>
</head>
<body>
<p>Open div warning, suppressed</p>
<div id="open">
<p>'--show-warnings no' drops msg but not exit-level</p>
</body>
</html>

The test output -

2019-01-21  19:19           767,488 tidy.exe
F:\Projects\tidy-test\test>F:\Projects\tidy-html5\build\temp-794\Release\tidy.exe -v
HTML Tidy for Windows version 5.7.20.I794
F:\Projects\tidy-test\test>F:\Projects\tidy-html5\build\temp-794\Release\tidy.exe  --show-warnings no input5\in_794-4.html
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors! Not all warnings/errors were shown.
...
t-794: Tidy exit level 1

Again tidy says found 1, but showed none... and Tidy exit level is 1... like the -mute XXXX case... but ...

But does have the addition Not all warnings/errors were shown.... hmmm, how was that triggered? ...

Maybe that could also be triggered by a --mute XXXX option... More research, help, examples, etc needed...

So we can see examples are always very important, to at least show, expose, the current behavior... and sometimes version counts... then decide bug, design, change, right, wrong, better, worse, whatever...

In this case tidy altered the document - TRUE... while there are options which omit the msg output, they should not alter the exit value! Maybe exceptions...

Still digging... comments, feedback welcome... thanks...

ler762 commented 5 years ago

does have the addition Not all warnings/errors were shown. ... hmmm, how was that triggered?

$ tidy  --show-warnings no foo.html
Info: Doctype given is "-//W3C//DTD HTML 4.01 Transitional//EN"
Info: Document content looks like HTML 4.01 Transitional
Tidy found 37 warnings and 0 errors! Not all warnings/errors were shown.
   ...
$ echo $?
1

$ tidy  --mute TRIM_EMPTY_ELEMENT foo.html
Info: messages of type "TRIM_EMPTY_ELEMENT" will not be output
Info: Doctype given is "-//W3C//DTD HTML 4.01 Transitional//EN"
Info: Document content looks like HTML 4.01 Transitional
Tidy found 37 warnings and 0 errors!
   ...
$ echo $?
1

while there are options which omit the msg output, they should not alter the exit value!

This might be a good use case for suppressed --mute XXX messages to not increment the warning count since the user is telling tidy that they don't care/don't want to know about that particular issue.

geoffmcl commented 5 years ago

@ler762 thanks for the additional testing... but not sure what has been added...

does have the addition Not all warnings/errors were shown. ... hmmm, how was that triggered?

The answer to this is current simple code choice - some spaces removed for compactness -

void TY_(ReportNumWarnings)( TidyDocImpl* doc ) {
    if ( doc->warnings > 0 || doc->errors > 0 )     {
        if ( doc->errors > cfg(doc, TidyShowErrors) || !cfgBool(doc, TidyShowWarnings) ) {
            TY_(Dialogue)( doc, STRING_NOT_ALL_SHOWN );
        } else {
            TY_(Dialogue)( doc, STRING_ERROR_COUNT );
        }
    } else {
        TY_(Dialogue)( doc, STRING_NO_ERRORS );
    }
}

Not very exact, but sort of does the job... and --mute XXXX could be included in the 3rd line, if condition, probably by, adding say || doc->muted.list, or something... not tested yet... making mute conform... that is show the extra info...

Reading the API docs on mute... it seems clear... it has the words prevent Tidy from **displaying** certain types of report output... my emphasis on show...

It says nothing about also preventing Tidy bumping doc warnings... that is the exit code... and thinking about it more, why should it... the exit code is the true state, and must be kept/maintained, IMO...

While it seems/is ok to suppress console message output... we all want that!...

I implemented --show-info no, just because, in repeated testing, I wanted less console output... at least of a particular type/category... mute was a great, and extension of that, offering fine control over output... i.e. what is shown...

But never changing the exit code policy - see implement libTidy - paraphrased, 0, all is well, no doc change, good doc, 1, oops, found warnings, some of which hope are fixed in the output, 2, errors, that can not be fixed... and of course some negative exits, like could not find the file...

Also libTidy supports an API to be called back before each output, see say tidySetReportCallback, and you can filter the output, but that too does not influence the exit code...

See the code order of messageOut... note the switch ( message->level ), which sets these doc values, before any of the many go filters... which can then inhibit output...

All in all, it seems it is not ok to change the exit code... tidy made a doc change... it must reflect that... at least in exit, if not visually to the console, if requested not to... by many things...

The question becomes, should tidy break that existing policy?

If yes, give the use case... and for all, or some... be specific, as best you can...

This does not seem like one of them... at least at the moment... and maybe should be a new separate issue anyway...

@yaccz, as suggested earlier, you could add --drop-empty-element no to your config, and this whole issue becomes mute... ;=))

The default tidy behavior does not suit everybody... so we have configs, to suit the current use...

We should support the extra message, ie STRING_NOT_ALL_SHOWN, for mute, like the others, and will work towards this, conformity...

Appreciate further comments, feedback, patches, even tested PR, to implement, at least this step... thanks...

geoffmcl commented 5 years ago

@yaccz - Testing patch to add not all shown, making --mute XXXX conform to say the --show-warning no option...

diff --git a/src/message.c b/src/message.c
index 0900407..198ca2e 100644
--- a/src/message.c
+++ b/src/message.c
@@ -1291,12 +1291,19 @@ void TY_(ReportMarkupVersion)( TidyDocImpl* doc )

 /* Reports the number of warnings and errors found in the document. 
 ** Called by tidyRunDiagnostics(), from console.
+** 
+** If there are 'warnings' or 'errors', then add 'not all shown'
+** msg, if the are any options that 'limit' message output. Not 
+** intended to be accurate or exact, just an indication that
+** some message(s) **may** have been suppressed, for some reason.
 */
 void TY_(ReportNumWarnings)( TidyDocImpl* doc )
 {
     if ( doc->warnings > 0 || doc->errors > 0 )
     {
-        if ( doc->errors > cfg(doc, TidyShowErrors) || !cfgBool(doc, TidyShowWarnings) )
+        if ( doc->errors > cfg(doc, TidyShowErrors) || !cfgBool(doc, TidyShowWarnings) ||
+             doc->muted.list) /* Is. #794 - add '--mute XXXX' option 
+                                 to other options which limit messages */
         {
             TY_(Dialogue)( doc, STRING_NOT_ALL_SHOWN );
         }

Seems to work...

Appreciate if others could test, confirm this fixes the conformity bug... thanks...

geoffmcl commented 5 years ago

To make it easy for testing, have pushed this patch to the issue-794 branch... and the test cases to my tidy-test repo ...

$ cd tidy-html5 # get to tidy cloned source
$ git pull # update src
$ git checkout issue-794
$ cd build/cmake
$ cmake ../.. [other options] [-DTIDY_RC_NUMBER=I794] # add identifying version
$ make # build tidy...
$ ./tidy [-v] or [test_input] like [--mute TRIM_EMPTY_ELEMENT ../../../tidy-test/test/input5/in_794.html]

This change may effect the tidy-html4-tests regression tests... still to test this... any help appreciated...

As stated, be great if others could test, and confirm this patch/fix... and will get to a PR... thanks...

geoffmcl commented 5 years ago

In a quick regression test, yes tidy v.5.7.20.I794, as expected, fails on case-629.txt... will work towards a PR for this also, unless someone beats me to it... thanks...

jan-matejka commented 5 years ago

I am sorry, I am quite busy at the moment and this issue came up on a side project of mine. I should be able to test in a week or two.

jan-matejka commented 5 years ago

Not working as expected.

Reproducer:

~/git/3rd/tidy-html5/build/cmake git:issue-794 [1]
yac@remy % cat ~/tidy-in.html
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">                                                   
<head>
<meta charset="utf-8"/>
<meta name="generator" content="Docutils 0.14: http://docutils.sourceforge.net/" />                                   
<title>Shell Scripting Survival Guide</title>
<link rel="stylesheet" href="static/main.css" type="text/css" />                                                      
</head>
<body>
<div class="document" id="shell-scripting-survival-guide">
<pre class="code make literal-block"><code><span class="name variable"></span><span class="comment">## installation targets
</span></code></pre></div>
</body>
</html>
--------------------------------------------------------------------------------                                      
~/git/3rd/tidy-html5/build/cmake git:issue-794
yac@remy % cat ~/tidy-in.html | tidy -i -quiet --mute-id yes -o ~/tidy-out.html                                       
line 11 column 44 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)                                               
--------------------------------------------------------------------------------                                      
~/git/3rd/tidy-html5/build/cmake git:issue-794 [1]
yac@remy % cat ~/tidy-in.html | tidy -i -quiet --mute TRIM_EMPTY_ELEMENT -o ~/tidy-out.html                           
--------------------------------------------------------------------------------                                      
~/git/3rd/tidy-html5/build/cmake git:issue-794 [1]
yac@remy % cat ~/tidy-in.html | ./tidy -i -quiet --mute TRIM_EMPTY_ELEMENT -o ~/tidy-out.html                         
--------------------------------------------------------------------------------                                      
~/git/3rd/tidy-html5/build/cmake git:issue-794 [1]

Lines like

~/git/3rd/tidy-html5/build/cmake git:issue-794 [1]

Indicate that the previous command ended with exit code 1 by the ending on [<exit-code>]

geoffmcl commented 5 years ago

@yaccz thank you for testing issue-794, but do not understand Not working as expected.?

I assume you expected the option --mute TRIM_EMPTY_ELEMENT to not bump the exit code. Not true... it does...

Please re-read all the comments, except my first! ;=))

At first I thought it a reasonable request, but on exploring, testing more, I rejected this... reversed my initial thoughts... sorry...

Citing that there were already some options to suppress message output, including -quiet, but none that changed the exitcode... and --mute XXXX should conform to these... ie not be a special no exit code case...

Thus the only difference between current tidy, and tidy-I794 is the message output, if allowed by removing -q from the config will be -

Tidy 5.7.22
...
Tidy found 1 warning and 0 errors!
...

to

Tidy 5.7.20.I794
...
Tidy found 1 warning and 0 errors! Not all warnings/errors were shown.
...

But both will exit with 1... this has not changed...

The document has been modified by tidy... accordingly tidy can not exit 0... which indicates Success, good to go....

It is not good to go... it has an error, which you should fix, if you want the default config --drop-empty-element yes... you can change that config, and get 0, if that is what you want...

Note to self, issue-794, version 5.7.20, needs to be rebased to next, version 5.7.22, which has another --mute XXXX option change... namely Config: to Info:...

Look forward to further feedback... thanks...

jan-matejka commented 5 years ago

Ok, then it looks like it works as you intended. --drop-empty-elements no is what I need.

silky commented 4 years ago

re-opening this

it would be really good if tidy would not adjust the error code for warnings

consider

09:45 PM noon ∈ build (next*) cat ex.html                                                                  1 ↵
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Is. #754</title>
</head>
<body>
<a href="what ever.JPG" />
</body>
</html>
09:46 PM noon ∈ build (next*) tidy --mute ESCAPED_ILLEGAL_URI,ILLEGAL_URI_CODEPOINT ex.html                1 ↵
Info: messages of type "ESCAPED_ILLEGAL_URI" will not be output
Info: messages of type "ILLEGAL_URI_CODEPOINT" will not be output
Info: Document content looks like HTML5
Tidy found 2 warnings and 0 errors!

<!DOCTYPE html>
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.7.28">
<meta charset="utf-8">
<title>Is. #754</title>
</head>
<body>
<a href="what%20ever.JPG"></a>
</body>

error code is 1.

while it's great that it's not kind of "writing" the warning to console; i'm actually using tidy as part of a toolchain, and when the exit code isn't 0, the chain fails.

so, how can i just ask tidy to suppress this warning?

geoffmcl commented 3 years ago

@silky, thank you for the comment, and sorry for the LONG delay in replying on this...

There is simply NO option to modify the exit code... The mute and quiet options only suppress the message output...

Please re-read my comments, particularly -

The document has been modified by tidy... accordingly tidy can not exit 0... which indicates Success, good to go....

For the particular case given in this issue, there exists an option, --drop-empty-elements no, which means tidy does not modify the document... hence no warning... no bump in exit code...

In the circumstances where tidy is being used in a toolchain, I would suggest that it is not as simple as testing if the exit code is non-zero... If you have no interest in warnings then you need to test if the exit is not 2, or negative... ie 0 or 1 is ok... continue processing...

So, at this point, can see no reason to re-open this... sorry...