htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.71k stars 417 forks source link

`--drop-proprietary-attributes yes --strict-tags-attributes yes` causes tidy to report error instead of dropping non-applicable HTML attributes #688

Open i-give-up opened 6 years ago

i-give-up commented 6 years ago

[Bug] According to the quickref, when both drop-proprietary-attributes and strict-tags-attributes are enabled, tidy should drop attributes that are not applicable to the HTML version. In reality, tidy reports errors instead.

Environment

Command

tidy --strict-tags-attributes yes --drop-proprietary-attributes yes test.html

I also tried switching the order of the options but the issue persists.

Input

test.html

<!DOCTYPE html>
<html>
  <title>Testing</title>
<body>
<table>
  <tr>
    <td align="left" valign="top">Hello whirl!</td>
  </tr>
</table>
</body>
</html>

Expected Output

Non-applicable attributes align and valign should be removed from <td>

<!DOCTYPE html>
<html>
  <title>Testing</title>
<body>
<table>
  <tr>
    <td>Hello whirl!</td>
  </tr>
</table>
</body>
</html>

Actual Output

line 7 column 5 - Error: <td> attribute "align" not allowed for HTML5
line 7 column 5 - Error: <td> attribute "valign" not allowed for HTML5

Update

Apparently it works as expected if you include the option --force-output yes. This is non-ideal though because I want it to generate output if it's just non-applicable attributes and to fail if it's some other error.

geoffmcl commented 6 years ago

@i-give-up thank you for the issue...

Researching this and it does seem a strange combination...

As the documentation states, setting strict-tags-attributes will report bad attributes of an element as an error... Well maybe that should be a warning?

And then it goes on to say "Additionally if drop-proprietary-attributes is enabled, then not applicable attributes will be dropped, too.", but it does not make it clear that since they will be marked as an error you will get no output without --force-output yes...

And likewise the documentation of drop-proprietary-attributes, where it says "Additionally attributes that aren't permitted in the output version of HTML will be dropped if used with strict-tags-attributes."! And yes that's sort of true, but if it then reports an error then all output is dropped, so again you need --force-output yes...

And then there is the default config, which will only warn about the align attribute, and not the valign... Why? But it is using a warning... perhaps a bit harshly worded Warning: <td> attribute "align" not allowed for HTML5... hmmm...

I would prefer a wording like the nu validator - Error: The align attribute on the td element is obsolete. Use CSS instead., but that is just a semantic preference...

And in this case drop-proprietary-attributes will do nothing. I guess because they are not strictly proprietary! Maybe this should be stretched to include such bad attributes, and/or maybe there is a need for a new option like say drop-invalid-attributes or something...

So I can not yet see a solution for this, except perhaps downgrading strict-tags-attributes to a warning...

Look forward to further feedback, comments, patches, PR on this... thanks...

i-give-up commented 6 years ago

I see 3 possible solutions

  1. update the documentation to say you have to use --force-output yes in combination with --strict-tags-attributes yes --drop-proprietary-attributes yes to get output. This is non-ideal because it generates output regardless of all errors and not just invalid attributes.
  2. if both --strict-tags-attributes and --drop-proprietary-attributes are used, then generate output without needing --force-output yes. In other words, downgrade --strict-tags-attributes to warning when used in this combination. No changes needed in documentation.
  3. a new option (e.g drop-invalid-attributes as you suggested) which would remove both proprietary and obsolete/invalid (for the HTML version) attributes and output a message when such attributes are removed. The documentation would have to be updated to include this new option and to correct the part about using both --strict-tags-attributes and --drop-proprietary-attributes. I like this solution best because unlike solution 2 it doesn't involve a reinterpretation of the word "proprietary" to include obsolete/invalid attributes.

I'm not familiar with HTML validation though so there are probably certain things I failed to consider.