htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.71k stars 418 forks source link

Support custom elements #119

Closed Moult closed 7 years ago

Moult commented 10 years ago

Spec is here: http://www.w3.org/TR/custom-elements/

Currently, custom elements throw errors.

geoffmcl commented 9 years ago

Please note the current development version is in develop-500 $ git checkout develop-500 At this time issue must only be against this version...

But tidy has config items - new-blocklevel-tags Tag names tagX, tagY, ...
new-empty-tags Tag names tagX, tagY, ...
new-inline-tags Tag names tagX, tagY, ...
new-pre-tags Tag names tagX, tagY, ... where you can add custom elements. Is this what you mean?

If this does not work as expected in the develop-500 branch, then look forward to further comment, especially with a sample html input, the configuration used, the tidy output, html and error/warning text, and the output expected...

Maybe I misunderstand the feature requested, in which case again explain, preferably with sample html usage...

balthisar commented 9 years ago

Abstract This is a work in progress! This specification is for review and not for implementation! For the latest updates, including important bug fixes, please look at the draft on GitHub instead. This specification describes the method for enabling the author to define and use new types of DOM elements in a document.

As such, this should on the back burner for the time being.

yoshuawuyts commented 9 years ago

I'm using custom-element now (easily enabled through webcomponents.js, and it throws a proprietary attribute error when I'm using custom elements. E.g. I'm trying to do this:

<button is="custom-button"></button>

And the error is:

<button> proprietary attribute "is"

I understand that having full custom-element support might take quite some work, but maybe a good first step would be allowing is="" properties to be set on all elements? Thanks!

bendavis78 commented 9 years ago

Tidy still needs an option to disable checking for invalid tags and attributes, instead of having to pass every possible custom tag as an argument. When working on projects that involve a lot of custom elements (eg, Polymer), it becomes very difficult to construct the command line to check your file.

geoffmcl commented 9 years ago

@bendavis78 feel free to code a new feature to be added to libTidy to fully disable invalid tag and attributes checking...

However, it does feels a little contrary to what tidy is about, but such a feature addition would be considered...

And remember such custom element additions can be added to a config file, thus removing the difficulty in constructing a command line...

Will close this for now, since there have been no further comments for a few months...

But as always, feel free to re-open, or open new issues...

geoffmcl commented 9 years ago

Also see #225 where unknown tags is used instead of custom elements...

bendavis78 commented 8 years ago

Sorry, I missed notifications on this and am just now back on a project where I needed this. Adding elements to a config file isn't really a viable solution. That means I'd have to update my tidy config every time I create a new custom element, which in any webcomponents library can be in the hundreds.

Regarding "contrary to what tidy is all about", can you expand on that? Is tidy only for validation, or is it also for formatting? I see both use cases as separate (but not exclusive). Sometimes I just want to format some HTML and don't care whether or not the tags are part of the W3 spec. Tidy is immensly useful for formatting, and it should be up to the user whether or not they are concerned about validation.

geoffmcl commented 8 years ago

@bendavis78 ok, I can understand in your use case is not very convenient having to add them to the config... although once this file was built up, I am not sure additions would be that frequent...

Regarding "contrary to what tidy is all about", can you expand on that?

Hmmm, not easily. To many, Tidy is not a validator, although I sometimes use it like that. I have seldom seen a tidied page that does not pass W3C validation. So I pass all my html pages through tidy, choose an indent formatting, add its cute logo, and the W3C validation logo, even before I have actually really checked... and would probably post an issue if it did fail ;=))

This is my simple use case, and I suspect this would be the same for many tidy users.

So the executable console app tidy is a collector of up to 100 options, reads the HTML, tidyParse..., checks for general W3C compliance, but not all, tries to fix where possible, tidyCleanAndRepair and tidyRunDiagnostics, and formats the ouput - pretty print - tidySave...

That console app uses the libtidy lirary. Any project that uses that library can ignore libtidy warnings, or buffer them, and check what is important to it... and from access to the DOM like set of tidy nodes that make up a document, can choose what it outputs... there are many use cases of this...

So while an option to ignore all invalid tags and attributes is certainly very possible, as indicated, it seems contrary to the library, that's all... no strong thing...

And I, just as one person, do not yet see a wide use case, so would not be likely to do the coding at this time, but that could change... get more support...

I am always here to help with tidy internal code where I can... and could help others add such an option... would not be difficult...

As always, discussion, patches, PR all welcome.

jrunning commented 8 years ago

Just to add some data points to this discussion, the project I'm currently working on has 215 custom elements. A new feature can add between 0 and 10 custom elements (and sometimes remove them). The team generates between 3 and 10 new features every week. In order to use Tidy we would have to update its configuration every day, which is untenable.

Tidy is not a validator, but the nevertheless it refuses to run on documents it considers invalid. Very soon the Custom Elements spec will pass from Working Draft to Recommendation and it will be the unfortunate case that Tidy will refuse to run on documents that are completely valid per the W3C.

maxnordlund commented 8 years ago

It should also be noted that the big JS frameworks out there, Angular, Emeber, React and Polymer, all add custom elements. This is also the preferred way of organizing your code, so it easily becomes hundreds of elements which you can't be expected to keep in an config file.

On my work we have, as of writing this, 74 with more added every day.

tobie commented 8 years ago

Custom elements are required to have an "-" in their tagName.

That might be an easy way to single them out and have an opt-in mechanism to allow-custom elements in Tidy.

geoffmcl commented 8 years ago

@Moult, @bendavis78, @qualiware-global, @jrunning, @maxnordlund, @tobie, and others, given the ongoing interest here, I am re-opening this, and giving it a milestone of the next release... or the next...

To be clear, I am not really against such an option, and would certainly consider and review a PR... and as @tobie mentions, it seems they would not be too difficult to identify... and as discussed elsewhere, it seems such W3C specs do and can get broad support long before they reach REC status, if ever...

Again I would mention this OPTIONS.md document... just 4 little steps to set down, decide and discuss... then code... then a PR... as is currently happening in #456 / #458 ...

So who wants to add the Feature Request to Tidy? As always I would help where I can... thanks...

geoffmcl commented 7 years ago

Due to no comments, and no action like a PR, moving this out to the next release...

Maybe in fact it should be closed until someone has the time to do something specific about this issue...

maxnordlund commented 7 years ago

Well, I would have made a PR if I could, but I don't code in C and even if I did this is not exactly an trivial project which makes it that much harder. I know it's a lot to ask for, but please do consider implementing this as I'm sure you can do it without too much trouble.

morewry commented 7 years ago

The relevant Custom Element standard (which, although still a working draft at the W3C, should soon be eligible for recommendation status) has, in the 2 years since this issue was opened, been revised to v1 with broad agreement, been integrated into the WHATWG living standards, become available in stable releases of both WebKit and Blink, and should soon be available in a stable release of Gecko. Polyfills enable use in IE and Edge, MS being the remaining major vendor where implementation is still only "under consideration," although the status page notes, "development is likely for a future release."

Necessary changes would be treating the is attribute with a value that contains a hyphen - as globally valid and treating tag names with hyphens - as valid, such that tidy is able to run on documents built with custom elements without requiring a user to maintain a configuration that includes each utilized element, which is not tenable (as previously noted). Ignoring all invalid tags, while it would address tidy's refusal to run, is a less desirable solution from my perspective, although separately a valuable option for other use cases.

Otherwise, tidy will, within the next year or two--some might say sooner, but that's my estimate--be outdated to the point that it's unusable on modern and entirety valid HTML.

I respect that maintainers of open source projects prioritize requests as appropriate and may strongly encourage PRs, but I suspect most people impacted by this issue won't be C programmers. While presence on the roadmap is reassuring so far as it goes, many users will need something that works "now" (ish) and if they don't find it here, they'll go elsewhere. That would be sad given tidy's history and preeminence. Closing this issue would send perhaps a stronger and more negative message than intended.

geoffmcl commented 7 years ago

@maxnordlund, @morewry thank you for the feedback... this is much appreciated, and helps with my ongoing education and understandings...

Since we are considering a 5.4 tidy release, I only moved this to beyond that, since no action had been started... and only babbled about closing ;=))

As agreed elsewhere, there are times when tidy should support conditions, specification, in a relatively mature W3C working draft (WD), especially if widespread acceptance, since the recommendation (REC) status can be very slow in arriving, perhaps if ever, in some cases, as suggested somewhere...

Now it seems this issue boils down to two things -

  1. support is attribute, with a value containing a -. Is this allowed on all elements? Except custom-elements? Refs?
  2. support custom elements, again with a name containing a -. See valid-custom-element-name? With some exceptions...

Please correct me if I am getting this wrong...

I have just pushed two README files, ATTRIBUTES.md and ELEMENTS.md detailing the steps to achieve 1. and 2. resp.

Although the 2., in this case, would be a little different, in that it would have to be discovered by an algorithm, a function, rather than exactly matching a string, which presents an small extra challenge... maybe the table string could be like *-*, or something...

And having said that, I can see why having to add all new custom elements to a config file would be untenable, in that it seems html generators can add just about anything they fancy, with few exceptions! So maybe this support should be via an option like --support-custom-tags, or something... see OPTIONS.md.

Yes, we do strongly encourage PRs... just can not do all this alone... but as seen will help, at least by adding a HOW-TO-README... and there is some discussion that this README information could also be added to a Tidy Wiki, maybe with more actual code example...

Again, look forward to further comments... thanks...

balthisar commented 7 years ago

Some of the questions, referring to the working draft:

geoffmcl commented 7 years ago

@balthisar hi, well I read it as two separate cases...

1: the is attribute, extending existing elements 2: custom elements, that can not use is

Case 1: "is" attribute extension

Input: input5\in_119-2.html

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Issue #119-2</title>
</head>
<body>
<button is="custom-button"></button>
</body>
</html>

Config: input5\cfg_119-2.txt

show-info: no
tidy-mark: no
drop-empty-elements: no
drop-proprietary-attributes: no

The only problem with Tidy is the is attribute - line 8 column 1 - Warning: <button> proprietary attribute "is", yet that file passes W3C validation - "Document checking completed. No errors or warnings to show.", for uploaded file in_119-2.html.

So it seems tidy should support the is attribute extension, and the only verification of the attribute value is that it contains a hyphen.

Yes, Tidy would not be able to verify, check the attribute value, so would accept anything with a hythen... not the best, but seems what the validator does... so the following seems ok -

<button is="absolute-rubbish"></button>

And assume tidy would warn/error on say is="nohyphen", or is="too-many-hyphens", etc... full extent of error/warn conditions to be explored, enumerated, decided, ... some work here...

Case 2: a custom element

Input: input5\in_119-4.html

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Issue #119-4</title>
</head>
<body>
<flag-icon country="nl"></flag-icon>
</body>
</html>

Config: input5\cfg_119-4.txt

show-info: no
tidy-mark: no
drop-empty-elements: no
drop-proprietary-attributes: no
new-blocklevel-tags: flag-icon

Again, this document in_119-4.html passes W3C validation, and since I added the new element to the tidy config it nearly passes Tidy with an output of -

line 8 column 1 - Warning: <flag-icon> is not approved by W3C
Tidy found 1 warning and 0 errors!

Info: Document content looks like HTML5
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Issue #119-4</title>
</head>
<body>
<flag-icon country="nl"></flag-icon>
</body>
</html>

A little more complicated, but is really a question of an option like --support-custom-tags yes. So yes tidy would then accept even <flga-icon country="b"></flgs-icon>, but should reports a warning if <flag-icon country="b"></flgs-icon>, and maybe fix it...

Yes, I agree such an option seems to reduce some of tidy's check every element, every attribute and value, but as we know there are quite a number of other cases where tidy presently does not, can not, do full checking and verification, so long as it is generically right... and in other issues has trouble verifying urls in an a href, and recently an attribute tabindex={{some code}} since it expect only a numeric value... etc...

So I have shown two case where the W3C validator says ok, and tidy blurts out a warning... should tidy try to catch up?

Or have I missed madly the point of your questions?

Regards, geoff.

balthisar commented 7 years ago

@geoffmcl, you've answered questions, specifically that the validator accepts them. That's good enough for me. The questions were meant to be a bit broader and not specifically at you. I think we both agree that if it's good enough for the validator, it's good enough for Tidy. I don't want to look at an issue in a year where someone opens a ticket because flga-icon wasn't properly defined somewhere, due to improper expectations. I would be good to hear others' feedback, too.

maxnordlund commented 7 years ago

I also think that would be good enough, if you really want that level of warnings you can always maintain a list of allowed elements. However, if it could detect matching tag tyops, that would be great.

gautaz commented 7 years ago

For the record, Nu HTML Checker (which is a HTML validator available from W3C developer tools) has recently added support for HTML custom elements.

Perhaps the way its maintainer has used to add custom elements support might be a source of inspiration.

tobie commented 7 years ago

Pinging both @sideshowbarker (which fixed the Nu HTML Checker), and @dontcallmedom (who maintains the W3C htmldiff service which relies on tidy).

balthisar commented 7 years ago

It's not so much a matter of how to code it, but how it should be handled.

Another wrinkle is that without understanding the document (meaning we'd have to load an parse the javascript), do we treat custom elements as block, inline, or pre? This is a critical distinction because Tidy doesn't merely say yes or no to an element; it pretty prints HTML, and if we don't know whether or not an element is inline or not can affect whitespace in the rendered document.

maxnordlund commented 7 years ago

After reading through https://github.com/w3c/webcomponents/issues/224, https://github.com/w3c/webcomponents/issues/426 and https://github.com/w3c/webcomponents/issues/468, the consensus is that the default style for custom elements is inline.

However there were a fair amount of developers that wanted it changed to block, but that would probably break existing sites and because of that it was decided against. I think defaulting to inline with an option for block would be pretty good.

balthisar commented 7 years ago

I think defaulting to inline with an option for block would be pretty good.

It would then be all or nothing.

I suppose a strategy to get this to work would be to dynamically assign a new tag (containing the right naming requirements) to new-inline-tags, and possibly issue a warning that it's being treated as inline? Users would then have to review the output messages, and manually add things to new-blocklevel-tags that they don't want considered inline?

The only other way is to turn Tidy into a Javascript engine, I think.

maxnordlund commented 7 years ago

I suppose a strategy to get this to work would be to dynamically assign a new tag (containing the right naming requirements) to new-inline-tags, and possibly issue a warning that it's being treated as inline? Users would then have to review the output messages, and manually add things to new-blocklevel-tags that they don't want considered inline?

This type of manual work is what we are trying to avoid, but I think it's possible today to add arbitrary inline/block level elements. It's just too tedious when you have ~100 custom elements.

While the all or nothing is not perfect, I would argue it's more then good enough. Especially since this is currently blocking, at least my, usage of Tidy.

The only other way is to turn Tidy into a Javascript engine, I think.

Yeah, that and CSS, because you need that to deduce the display property. I think we can agree that is way to much work.

If there's a possibility to have a JSON-based format for specifying tag metadata, the further down the road you could write a tool for your favorite framework to generate this from your existing definition. Using something like https://github.com/electron/electron to load your app and the dump the custom elements registry should be an decent cross-framework way of doing this.

morewry commented 7 years ago

As a user, I agree that I wouldn't expect Tidy to execute or parse JavaScript or compute CSS styles, but rather to make a "good faith effort" to warn about possible issues and to format the markup.

The reason I'd prefer Tidy be able to generically recognize a custom element is so that Tidy can continue to make that good faith effort to warn about the kinds of potential issues it warns about today. For example, one of the major reasons I utilize a tool like Tidy is to help find mistakes that can negatively impact the user's experience, such as unclosed tags.

So, the point made about a document where there is an unmatched opening and closing tag due to a typo such as <flag-icon></flga-icon>--it would be great if Tidy can feasibly warn about that kind of case. If the typo occurs in both the opening and closing tag, e.g. <fgla-icon></flga-icon>, I think it's reasonable for Tidy to assume it's as the author intended. Although it suggests to me that it'd be a nice-to-have enhancement at some point if Tidy could log everything it assumed was an author defined custom element.

I think of the is attribute as global, like class, although it's not quite as it has a caveat that it "must not" be used on autonomous custom elements (those with a custom tag name). But if there is a whitelist or blacklist of standardized tags specified, I can't find it. So general seems the closest existing fit.

I'd note here in respect to too-many-hyphens that while the custom element spec requires a custom element name to have at least one hyphen -, it isn't limited to one hyphen (you can have two or three or four hyphens). My understanding of the requirement for a hyphen is that it's intended to strongly suggest that developers utilize a name-spacing-esque prefix to reduce the risk of name collisions.

I concur with treating custom elements as inline by default. I'm definitely in the camp of wishing the standard had been block--at least for markup formatting purposes. (For styling purposes, I've felt inline as default has decided benefits.) I would like having an option to format all elements identified as autonomous custom elements as block, paired with the existing new-inline-tags option to customize. However, for a first pass, using inline, paired with the existing new-blocklevel-tags option, is sufficient.

@geoffmcl The new docs look great, thanks for them!

balthisar commented 7 years ago

If any of you compile your own, then try checking out the branch in PR #513 to give me your thoughts. In particular, try the custom-tags option -- see documentation for options.

513 is dependent upon an earlier PR that needs careful review in its own right, so this one is probably at least a week away (maybe more) from hitting. Don't use it for production, but I'd love to hear feedback on how it handles custom tags and how it handles the is element.

Some notes:

I'd like to further invite everyone to extend the conversation by arguing about what the future default should be. Right now it's currently no, but perhaps inline makes some sense so that Tidy by default does something with custom elements. Maybe something else makes sense. Please discuss.

maxnordlund commented 7 years ago

Inline seems most inline with the spec (pun intended), and perhaps a message telling the user about the option. "Hey, I saw you are using custom elements, interpreting them as inline elements. If that's not the case see xxx option or yyy config."

balthisar commented 7 years ago

@maxnordlund, yup, when set to inline, a sample output is

line 15 column 1 - Info: detected autonomous custom tag <m-custom>; will treat as inline

That's another good point. Right now, this particular message is emitted as TidyInfo, which is probably what we want. In the case of is, though, some sample output:

line 8 column 1 - Warning: <h3> attribute "is" lacks value
line 9 column 1 - Warning: <h4> attribute "is" has invalid value "invalid-because space"
line 15 column 1 - Warning: <m-custom> attribute "is" not allowed for autonomous custom tags.
line 15 column 1 - Warning: <m-custom> attribute "is" has invalid value "not valid"
line 16 column 1 - Warning: <z-custom> attribute "is" not allowed for autonomous custom tags.
line 18 column 1 - Warning: <hello-world> attribute "is" not allowed for autonomous custom tags.
line 19 column 1 - Warning: <hello-world> attribute "is" not allowed for autonomous custom tags.
line 19 column 1 - Warning: <hello-world> attribute "is" has invalid value "not valid"
line 10 column 1 - Warning: <pizzeria> is not approved by W3C

Right now these are all configured as TidyWarning, meaning that Tidy will still render the document without using force-output; If we make them TidyError instead, then the document will not render (unless using force-output). @geoffmcl, any thoughts on correct error level here?

Also there's some question about how noisy the output is, too. It's easy to control, but I've left it noisy so far. For example, in the m-custom example, the higher priority warning means hey, you can't use is for this tag, but then the next warning says that the value is invalid for is (because of the space; a properly formatted value would have squelched this warning).

Finally, the pizzeria tag was defined with new-inline-tags (so, missing the hyphen), but the warning would be emitted for an autonomous custom tag, too. This warning message now brings up another good point: the pizzeria tag is not a valid custom tag, so probably it's okay for this warning to be applied, but what about for valid custom tags? A specific custom tag is not approved, but is this warning output really necessary? LibTidy users can easily mute them, but as in the console application will always show them.

Let me ping everyone who's been in this conversation: @geoffmcl, @maxnordlund, @morewry, @tobie, @gautaz, @jrunning, @bendavis78 @yoshuawuyts.

balthisar commented 7 years ago

Hmmm... I should also point out that this feature works regardless of the the doctype. Maybe this is an argument to leave the default to no, otherwise Tidy would be generating bad HTML4 code. If the config turned the feature on, then it would be up to the user to know that, given the warnings, his code isn't HTML4 compliant. On the other hand, there are probably polyfills/shims for HTML4.

maxnordlund commented 7 years ago

Yeah, that's a valid concern. I do think it should be on by default, but a safety check for a missing doctype seems prudent. Though, I have a bunch of partials, which don't have a doctype at all, so the option for ignoring doctype/html/head/body should also turn on custom elements.

balthisar commented 7 years ago

Updated to PR #514.

geoffmcl commented 7 years ago

@balthisar so far I have only checkout the branch custom_tags_wip2, and ran a few tests, with --custom-tags yes...

As usual, would prefer tidy keeps html4 compatibility, and certainly not generate bad html4, even with warnings, or errors. But since we do maintain an internal mode would it not be possible to check this?

Concerning the W3C validator, from what I see, and perhaps @sideshowbarker can correct me if wrong, but if you present a document with a legacy doctype, it uses one set of code, but otherwise uses the nu code... although I do remember him saying that may change... and of course that is not true is you directly present it to nu, which will not handle legacy documents at all well...

So this is how I see tidy - legacy vs nu. If it detects a legacy doctype, it will perform one set of checks, otherwise with no doctype, a code snippet, or just <!DOCTYPE html> will stay in the default html5 mode...

So if we choose to default this option to on, like nu, then this could be another auto table change if legacy...

So I ran a test on the following, which passes the W3C validator. Used config --custom-tags yes -i, -

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Issue #119-5</title>
</head>
<body>
<div class="img"><img-viewer filter="Kelvin">
  <img src="images/tree.jpg" alt="A beautiful tree towering over an empty savannah">
</img-viewer></div>
<script src="js/elements/img-viewer.js" async></script>
</body>
</html>

And I get the message output -

F:\Projects\tidy-test\test\input5>tidy5-5.5.5I119 --custom-tags yes -i input5\in_119-5.html
line 8 column 1 - Info: detected autonomous custom tag <img-viewer>; will treat as block level
line 8 column 1 - Warning: <img-viewer> is not approved by W3C
Info: Document content looks like HTML5
Tidy found 1 warning and 0 errors!

Well, if the document passes W3C, and we want tidy to agree, why that warning?. Either it is approved, per the validator, or, if not then why is tidy supporting it? Sort of a mixed message...

Likewise, why the Info: message? The user has said support custom tags in the config! Are we going to tell him/her about every custom tag in the document? Why?

I could see the opposite. If --custom-tags no, I could see a helpful warning like Warning: detected autonomous custom tag <img-viewer>; use --custom-tags yes to remove warning, or something like that...

Concerning, will treat as block level. But did we? The output was inline type...

<!DOCTYPE html>
<html lang="en">
<head>
  <meta name="generator" content=
  "HTML Tidy for HTML5 for Windows version 5.5.5I119">
  <meta charset="utf-8">
  <title>Issue #119-5</title>
</head>
<body>
  <div class="img">
    <img-viewer filter="Kelvin"><img src="images/tree.jpg" alt=
    "A beautiful tree towering over an empty savannah"></img-viewer>
  </div>
  <script src="js/elements/img-viewer.js" async></script>
</body>
</html>

In general block elements, like <div...> will take a new line... certainly in output it has been treated as inline... or am I missing something here...

Will probably know, and understand more after more testing... I do understand this is WIP status...

PS: Until we find a better home push all my tests to https://github.com/geoffmcl/tidy-test/tree/master/test/input5 ... Would certainly appreciate some more samples to try...

geoffmcl commented 7 years ago

PPS: This was before #514...

balthisar commented 7 years ago

@geoffmcl, #514 is the same; I simply squashed it as it had all of the same commit history as the messages branch.

The discussion you raise above is exactly what I hope to see in this issue thread!

balthisar commented 7 years ago

@geoffmcl,

As usual, would prefer tidy keeps html4 compatibility, and certainly not generate bad html4, even with warnings, or errors. But since we do maintain an internal mode would it not be possible to check this?

Agreed. From strict-tags-attributes I seem to recall that the internal mode isn't really known until the end of the process, after we've built whatever backs apparentVersion, unless we're given a doctype. If given <HTML5, then I agree, these should be errors. If we don't know the doctype, we're assuming HTML5, despite what we accumulate in the doctype bitfield. Correct me if I'm wrong about this!

So I ran a test on the following, which passes the W3C validator. Used config --custom-tags yes

That reminds me; maybe we shouldn't support yes. I wrote the parse so that yes would be accepted (meaning blocklevel), but it's actually not a valid value. I wonder if we should eliminate it, or if we keep it, default it to inline per everyone else's suggestion. I would support dropping yes, but a lot of other parses support it, too.

As for why it's not rendering properly, I'll have to look into that. Once it's added to the list, it's exactly like adding it to new-blocklevel-tags (etc.), so unless that's broken, too, it should just work!

Well, if the document passes W3C, and we want tidy to agree, why that warning?. Either it is approved, per the validator, or, if not then why is tidy supporting it? Sort of a mixed message...

I could suppress this warning, as your logic is correct. I left it in because it's the existing behavior for new-xxx-tags; I wonder if we should suppress this warning for those options, too, since presumably if the user is adding them to options, he shouldn't be bothered with a warning about them.

Likewise, why the Info: message? The user has said support custom tags in the config! Are we going to tell him/her about every custom tag in the document? Why?

This is why I made them info level, so that hide info would suppress them. It lets user (or LibTidy user) get a list of all custom tags actually used in an auto-generated document, and they're easy to hide. This message only appears once for each tag, the first time it's discovered.

I could see the opposite. If --custom-tags no, I could see a helpful warning like Warning: detected autonomous custom tag ; use --custom-tags yes to remove warning, or something like that...

Ah, yes, excellent idea!

balthisar commented 7 years ago

Let me try to formalize the above, just a little bit so that I can introduce any changes logically rather than putting little bandages all over the place:

In this brave, new world of autonomous custom elements, we need to have Tidy distinguish between unknown tags and two types of custom tags: new tags, and autonomous custom tags (which I'll type as ACTs from now on).

We can make the following assumptions:

Unknown tags will always cause an error "xxx not recognized" and if force-output, additionally generate the message "discarding unexpected xxx." This is current behavior. HTML version and custom-tags are both not applicable in this situation. Things that look like ACT are not.

New tags (not ACT) will always generate a warning. HTML version is N/A. The message will be "xxx is not approved by the W3C." This is current behavior. However if a tag looks like a custom tag (but isn't, because custom-tags is no), the message changes to "xxx is not approved by the W3C; did you mean to set custom-tags to a different value?"

ACTs (whether detected inline or defined with new_xxx_tags, custom-tags is not no) will change based on HTML level:

Thoughts?

geoffmcl commented 7 years ago

Correct me if I'm wrong about this!

This is wrong. The mode is set after finding a doctype, or indeed not finding a doctype, at the head of a doc... see AdjustTags

This has nothing to do with apparentVersion, which as you state, is only valid after the parsing, and all bit eliminated...

Would even ague that in tidy5, this info should be removed... I really only added it for debug, and deliberately excluded it from --show-info no... as of now it should also be excluded...

See IsHTML5Mode... reset in ResetTags

Will address other comments later... but it seems important to understand this mode is set almost immediately, and can be used throughout...

Regards, Geoff.

Edit: fix markdown, but still wrong

geoffmcl commented 7 years ago

Agree with all that I read - good summary of the VM of html4, html5, custom tags, - thanks @balthisar, and now into testing the custom_tags branch...

Pushed a cmake fix to allow quick generation of a tidy version 5.5.6.I119 just by passing the command to cmake, -DTIDY_RC_NUMBER=I119... If you have anywhere near the number of version I keep, you should appreciate being able to set it to exactly what you want... of course, for this we default to the static library link, to avoid DLL hell, or shared library dependence for each version...

Need to construct, maybe two, or more test html, in_119-4.html, and in_119-5.html, to test legacy and html5, for each of the scenarios enumerated here, etc... any help appreciated...

These should be added to https://github.com/htacg/tidy-html5-tests/tree/master/cases/testbase as case-119-4a.html, case-119-4b.html, case-119-5.html, ... so we could run a full regression test suite...

And test this branch, address any bugs, like the possibility that the inline, block do not yet give the expected html output, before merging into stable next... At that time you now drop the RC_NUMBER, and it is ready for release...

This feels like quite a step forward, support custom-tags... thanks @Moult for opening this on Jul 19, 2014 at 01:31 GMT, and to all who participated with action and feedback... it is exciting... thanks...

geoffmcl commented 7 years ago

Certainly agree with an action like the following, or perhaps as a separate Feature Request -

the ability to catalogue the ACTs in their documents. No other output.

  1. could be added to tidy.c, our premier example of libTidy usage... maybe -help-custom would do, or
  2. we could have a separate tidy-ACT <file> which did exactly that...

I would undertake to write the latter, unless someone beats me to it... it's easy with the libTidy API... and we would help if it is missed...

And love that --show-info no will block all Info: type messages, even the one I exempted so long ago... just for debug... as well as the trailing big blurb, ... that sounds good to me... any will try to test and fix this...

Great stuff! Thanks...

balthisar commented 7 years ago

@geoffmcl, thanks. I'll add the tests immediately. I've updated a 5.5.6 branch in -tests, and the good news is that it's identical to the 5.5.0. I know I tested 5.5.6 with all of the messages changes, but I'm impressed that my major tinkering didn't screw this up.

I'll add the news tests and push to custom_tags in -tests, too, using the recommended case numbers as above.

geoffmcl commented 7 years ago

@balthisar, well to me, rather than screwing things up you have massively enhanced the test site! It becomes very usable...

Great that 5.5.0 equals 5.5.6! That is good, and proof of the testing system we use... thank you...

I would not have bumped cases/_version.txt. Something about stable since blah, blah... maybe not important...

As of my last pull, of the customs_tags branch, I did not find any new 119 tests, but understand they can take time... thanks...

balthisar commented 7 years ago

@geoff, no recovery from git clean -Xdf before commit! Will repeat tomorrow. :-(

geoffmcl commented 7 years ago

Read this git clean [opts] earlier, with much interest... my git status can show a lot of things, sometimes...

But have my own cmake-clean scripts, that will only clean the build directory... Sorry that these are not yet shared globally... but will try...

Cleaning the whole build environment is really needed, now and then... although I must say cmake seems to be able to keep track of source change, but also new/changed defines seem to trigger a full build in MSVC...

But a clean environment is always better...

balthisar commented 7 years ago

I pushed some tests into custom_tags today. I added six because I tend to be overly verbose, but they also explain all of the current output.

I've not replicated the issue where inline and blocklevel aren't working, though. They seem to work fine, and I've not made any changes to that part of the code.

geoffmcl commented 7 years ago

Really thanks for these... it really make testing fast and furious ;=))

Using my Tidy 5.5.6.I119.2 build... as of da55a6e commit...

Get a diff on 431716... it is just about the unknown option: split... I thought we had removed, changed, fixed that, but maybe just in my imagination...

While I do not mind too much about using diff options -uwB, but at the same time can not see why the expects are not set exactly as they should be... is there some other problem I am not seeing here?

Yeah, still to re-create the inline vs block problem I thought I saw... maybe nothing there... and I see you have added it to 119e/f and maybe others... forget it until I find it again...

Since you providing a conf for each, why do we not add show-info: no, at lease to some of them? This repetitive Info: output really adds nothing to the story... and I am still to remove the one I added...

But this is all looking great... Hope others get the chance to check and test... thanks...

balthisar commented 7 years ago

Get a diff on 431716

I thought I corrected that, because I removed all of the "deprecated" stuff a few commits ago in next... I'll have to take a look if you don't find it first.

show-info should be covered in some of the existing test cases, and were already covered in the messaging system refactoring. I didn't do anything new with show-info, except categorize some of the new output as TidyInfo. show-info has been there as long as I can remember... If there are existing cases, then anything categorized as TidyInfo will be hidden. In fact, if you're adding exceptions, then that will be tricky to do!

geoffmcl commented 7 years ago

@balthisar yeah, do not yet know how this split problem still exists... maybe a git rebase master, or something, ... might add some changes to the expected...

I do not think show-info is important... just if you are including a config file in a test, then why not eliminate all tidy blah, blah, blah, at the end... what does it add? Anything else it does would be a nice addition...

I certainly do not see this as an exception, creating a TidyInfo hidden output... FUD... rubbish...

No! When I introduced the option/flag tidy-info, the single first step was to not output the trailing verbiage... no more, or less... it was somewhere between the default and quiet...

Worked well, in MSVC Debug mode, and helped, eliminating trailing tidy verbage... a quicker review... etc... Absolutely nothing intended to be classified as HIDDEN!

And, as repeated, ad infinitum... I took the step of excepting the final Info:, that I regret, and would like to remove this blip... nothing more...

The six(6) testbase files, in the custom_tags branch are great, with lots of verbose information about the specific test they contain... That is very good, and thanks...

But you did not address the diff question... why not set/update the expected to the facts? Seems no problem...

But maybe I still miss something... thanks...

balthisar commented 7 years ago

But you did not address the diff question... why not set/update the expected to the facts? Seems no problem...

When I run the tests in the custom-tags branch, there are no diffs. All of the expected is there!

Re: show-info, there's also a message generated most of the time bearing TidyInfo: something like "Document content looks like HTML5". I'm not sure when it started getting reported, but it's always been reported in the error message table. Usually that plus the extra info at the end is what show-info hid. And, yes, in many cases the document content type is also reported in the verbose output, not just the message table.

It would be trivial to add show info: no to any of the tests. Let me take a look, though, because we want that info present in some of them in order to prove that Tidy is detecting the tags.

Maybe this is the wrong thread to discuss it, but with the new callback, a longer-term thing I had in mind was filtering in the console application. Not via configuration options, but in the console application proper. Using the new callback, it would be simple to sort messages based on line number, or filter against things users aren't interested in, such as message level.

I won't touch them, but conceptually, these kinds of controls really are CLI switches and not Tidy config options, i.e., right now the library supports --show-info, whereas it should be CLI's responsibility to act on -show-info.

I bridged this divide by having the new message filter report every message and ignoring these "CLI-type" options. This avoids breaking too many existing applications, while ensuring that CLI-type options don't get in the way of library users.

Oh, I went too far off on a tangent, but yes, I'll look at the test cases. Looking the new ones, you have a good idea that I tend to get overly verbose at times, so I guess I'll stop typing now ;-)