Open ghost opened 6 years ago
@sampablokuper thanks for the issue, I think...
Now maybe this is a good Feature Request
, but the first thing is to supply some small sample html... using name
and id
, to show what tidy
presently does, with this or that config... that is --anchor-as-name yes|no
...
Then you would need to show what you expect from --anchor-as-name auto
... i.e. how that should change the above sample outputs...
That is move this from theory, based on current documentations, which may or may not be the whole story, to simple practical examples... tidy currently does this... with config ????
it should do that...
I have tried to read and understand the current FixAnchors
code, which seems the only place TidyAnchorAsName
in tidyDocCleanAndRepair
is used, but without samples, and then what difference is expected, it is quite difficult, and time consuming...
In essence the samples only need to be one line, and we can use --show-body-only yes
to test... although later may need some legacy doctypes to ensure everything continues to work as expected there...
At this time marking this as Technical Support
, until it becomes clearer what actual Feature
is being requested... and that can only be determined by having some sample html to test... thanks...
@geoffmcl, thanks for your reply.
With the input <a id="foo" name="bar">baz</a>
, the only way to obtain a no-op is to use --anchor-as-name yes
:
$ echo '<a id="foo" name="bar">baz</a>' | tidy --anchor-as-name yes --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo" name="bar">baz</a>
$ echo '<a id="foo" name="bar">baz</a>' | tidy --anchor-as-name no --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo">baz</a>
By contrast, with the input <a id="foo">baz</a>
, the only way to obtain a no-op is to use --anchor-as-name no
:
$ echo '<a id="foo">baz</a>' | tidy --anchor-as-name yes --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo" name="foo">baz</a>
$ echo '<a id="foo">baz</a>' | tidy --anchor-as-name no --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo">baz</a>
Therefore, neither of those two options can be relied upon, in the general case, as a no-op.
The proposed --anchor-as-name preserve
option would yield a no-op with any relevant input, i.e.:
$ echo '<a id="foo" name="bar">baz</a>' | tidy --anchor-as-name preserve --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo" name="bar">baz</a>
$ echo '<a id="foo">baz</a>' | tidy --anchor-as-name preserve --quiet yes --show-warnings no --show-body-only yes --doctype strict
<a id="foo">baz</a>
I hope this demonstrates the validity of the feature request, and that this is not a technical support query. Thanks again :-)
@sampablokuper thanks for the samples
... quite interesting that you seem to be talking about legacy doctypes
, i.e. --doctype strict
, but will come back to that...
Sort of OT, but you are another person to comment on Technical Support
. I have always seen this more as a dicussion label, and not as a query
. Or at least a 2-way query
. One where I too am trying to learn and understand exactly what the issue addresses, requests... sometimes so I can attach a more accurate label
...
Accordingly will try changing this to a clearer, simple idea of a Technical Discussion
...
Am doing some testing on your samples, especially html5
versus legacy html4
and earlier doctypes... and how anchor-as-name
influences that... then what would say a preserve
addition do, or aim to do... exactly what is the use case of the preserve
, or a no-op
as you have termed it...
Also searching and reading W3C docs on this, and running tests on the W3C validator, both legacy
and nu
to see its results... any W3C links welcome, more samples, etc...
One of the important considerations is that any type of preserve
option does not restrict tidy
from doing the right thing, what ever that may be in each specific case, and doctype...
Tidy's general aim should be to produce a valid W3C document. I know this is not always the case at present, but it tries, and can be improved... Such an option should not force tidy to produce invalid html... Not that anything you have suggested so far is an error, but I hope you get the idea...
At present this seems all in the TY_(FixAnchors)(doc, ...)
service, run from the phase 2 tidyCleanAndRepair
API, in the internal tidyDocCleanAndRepair
service... there is already a lot of logic to be studied and understood there...
This may take some time to put together a technical
specification on what tidy
should try to do... specifically regarding anchors
and the id
and name
attributes... and already your samples help in that, thanks, and am adding the FR
label...
Seek further feedback, discussion, examples, even patches, or a PR, etc... would be most appreciated... thanks...
Thanks for your follow-up :-)
Accordingly will try changing this to a clearer, simple [label] of a
Technical Discussion
...
Good call.
Tidy's general aim should be to produce a valid W3C document. [...] Such an option should not force tidy to produce invalid html...
In the case of an (X)HTML or XML fragment or snippet, the input to Tidy is necessarily not a valid document. Nevertheless, it is reasonable for a user to want Tidy to process that input. To satisfy this reasonable use case, Tidy must necessarily be capable of creating output other than valid documents.
Fortunately, Tidy is already capable of this, as you know :-)
Personally, I think it is most useful to think of Tidy's end goal to be to act as a set of filters, each of which is designed to correct or to report on some category of issues likely to be found in (X)HTML or XML documents or fragments. Ideally, the user should be able to selectively turn those filters on or off. (And if on, then to choose between various available applications of those filters, if appropriate - e.g. to choose between yes
or no
for --anchor-as-name
.) This way, Tidy would be capable of achieving all the functionality that a user might desire, without ever forcibly performing any actions that the user does not desire.
The power to create valid documents must be part of that. But the power to prevent Tidy from silently semantically altering the input should also be available.
Seek further feedback, discussion, examples, even patches, or a PR, etc... would be most appreciated... thanks...
I'm sorry that I can't offer anything further in that vein right now :-(
@sampablokuper looked more at this, but still blocked on what is the purpose of this preserve
anchor-as-name option... how does it help...
In the main will leave aside the philosophic discussion on what is, or should be, tidy's goal, but stick with help produce valid html for the user... I am sure we could just go back and forth on this forever... will try to concentrate on any pratical use, and need, of this feature request...
I agree the current documentation is not sufficient, nor very helpful... That certainly needs to be improved... suggestions very welcome...
Next it seems this option means slightly different things in html5 vs legacy html4 documents...
In html4 W3C specs, like links html4, you can find things like name shares the same name space as the id
. I am not sure I fully understand what that means, but for sure I can set up an internal link to either a name
, or an id
... but the preference at the time seemed to be name
... and certainly seek more references on this
Hence, I think, this option came about to ensure if the user had added an id
, then this option, with a default of yes
, ensured a name
attribute would be added. And you would set this option to no
to avoid this, if that was what you wanted...
So here you need to show a use case where preserve
is needed in this html4 mode.
Either you let tidy fix the document, adding name
if missing, or you set it to no
...
Where then is preserve
needed? What would it do differently to no
? html4 document samples please...
Then html5 was born, and this sort of flipped this option on it head!
The id
was the dominent, and name
was depreciated... see say a-element, where name
has been omitted... and again seek more references on this...
But tidy
has still to catch up with this html5 change... It should warn about name
, if used, and if you swing this option as no
, it will silently remove it...
Thankfully if only id
given in a html5 document, it seems tidy
will not add name
, in any circumstances...
So while this indicates some work needed for html5, and some document updates, I can not see the usefulness of adding a preserve
choice...
If you disagree, what should it do in this html5 case? Again html5 document samples please...
Now, to begin testing and understanding this, I have add 7 test files to my site, and could add more -
These files can be viewed as html, by adding http://htmlpreview.github.com/?
to the url.
I am really trying to find a valid use case for this request
, for the addition of a sort of no-op
option...
Hope you, or others, can assist... thanks...
@geoffmcl wrote:
HTML4
[...] I think, this option came about to ensure if the user had added an id, then this option, with a default of yes, ensured a name attribute would be added. And you would set this option to no to avoid this, if that was what you wanted...
So here you need to show a use case where
preserve
is needed in this html4 mode.
Already provided in my comment above.
Either you let tidy fix the document, adding
name
if missing, or you set it tono
... Where then ispreserve
needed?
If an application consuming the HTML applies different semantics to name
than to id
, then adding and populating a name
attribute where one was not previously present could cause unwanted effects in such an application.
Similarly, removing a name
attribute from an element could cause unwanted effects in such an application (even if an id
attribute exists and is retained on that element).
So, there needs to be an option besides yes
or no
, which is where preserve
would come in.
What would it do differently to
no
? html4 document samples please...
Again, already provided in my comment above.
@geoffmcl wrote:
HTML5
[...] I can not see the usefulness of adding a preserve choice... If you disagree, what should it do in this html5 case?
I do disagree. IMO a preserve
option would be useful for HTML5 just it would be useful for HTML 4 and for XHTML: i.e. for the same reasons and behaving much the same way.
FYI, in HTML5, id
is a "global attribute", i.e. it can be applied to any element. The name
attribute, however, is defined only for certain elements, currently (according to this & this): <button>
, <fieldset>
, <form>
, <iframe>
, <input>
, <map>
, <meta>
, <object>
, <output>
, <param>
, <select>
, <textarea>
, and possibly <keygen>
.
AFAICT, it is perfectly valid for such elements to have both the id
and the name
attribute set, if desired, and indeed for those attributes to have different values to each other. (As usual, each id
attribute's value must be unique per-document.)
name
was depreciated... see saya
-element, wherename
has been omitted... and again seek more references on this... But tidy has still to catch up with this html5 change... It should warn aboutname
This is not quite correct. name
was not deprecated entirely in HTML5. Tidy should only warn about the presence of a name
attribute in an HTML5 document if it appears on an element for which name
is not a valid attribute in HTML5.
Again html5 document samples please...
These would be exactly the same as in my comment above, except that instead of the <a>
element, they would use one of the elements listed above for which the name
attribute is valid in HTML5.
@sampablokuper what application
, consumer of html
, are you talking about?
Ok, at least you are starting to narrow it down, and that is for HTML4...
And have you tested XHTML? Give an example where tidy
is in error. In most cases XHTML is handled differently in tidy
...
While I have no problem reading mozilla
, and/or w3schools
docs, tidy
tries to apply W3C recomendations...
This issue is about an anchor
, <a ...>
tag, not about other tags. If tidy is in error on any of these others, then please open a separate issue, and provide sample html that you think tidy
handles incorrectly... thanks...
And just to be clear, adding a preserve
would be more difficult. Read would need a new PickListItems
table. A simpler change from a Boolean
option to an AutoBool
, which allows a 3rd option, auto
, would be much easier. The auto
could signal a sort of no-op
in this case, and be more backward compatible...
So really no new information added... and I am not yet convinced that such a change
is required... but I am just one voice... and I could be wrong...
Now all that means is that I am not personally interested in coding such a change... so left to me this would presently be a Won't Fix
label... but...
If you, or others, want to present a PR, or further feedback, I will try to listen for a stronger use case... thanks...
@geoffmcl wrote:
what application, consumer of html, are you talking about?
No specific one: could be a simple static website, could be a dynamic web application. Could even be a mobile app with a WebView, or whatever.
Ok, at least you are starting to narrow it down, and that is for HTML4...
Not just for HTML4. I have addressed XHTML and HTML5 as well, in my comments above.
And have you tested XHTML? [...] In most cases XHTML is handled differently in tidy...
I already gave a relevant example of Tidy's behaviour in my comment above.
Quoting the man page: "If set to strict, Tidy will set the DOCTYPE to the HTML4 or XHTML1 strict DTD."
Give an example where tidy is in error.
I did not say Tidy is "in error". I explicitly marked this issue as a feature request.
In doing so, I noted a reasonable use case that Tidy currently fails to handle, that it would handle if the requested feature were added.
This issue is about an anchor, <a ...> tag, not about other tags.
That is incorrect.
This issue is a feature request relating to Tidy's --anchor-as-name
option.
If tidy is in error on any of these others, then please open a separate issue, and provide sample html that you think tidy handles incorrectly... thanks...
See above.
And just to be clear, adding a preserve would be more difficult. Read would need a new PickListItems table. A simpler change from a Boolean option to an AutoBool, which allows a 3rd option, auto, would be much easier. The auto could signal a sort of no-op in this case, and be more backward compatible...
As I mentioned in my first post above, the use of an Autobool as a way to provide a third option seems OK to me.
I am not yet convinced that such a change is required... but I am just one voice... and I could be wrong...
I think you are, in this case.
Now all that means is that I am not personally interested in coding such a change... so left to me this would presently be a Won't Fix label... but...
If you, or others, want to present a PR, or further feedback, I will try to listen for a stronger use case... thanks...
The use case is already strong. If, despite that, you don't want to address the issue, then that will just perpetuate an inconvenience for Tidy's users :-(
In any case, rather than closing this issue as WontFix, I would ask that it at least be left open for anyone who does have an interest in submitting a fix and closing via PR to do so. Thanks.
Thanks for maintaining HTML Tidy!
The documentation for
--anchor-as-name
says:I can see that those two distinct functionalities may each be useful in specific use cases. It is good that
tidy
offers them.However, if the user wishes for no modifications to be made to existing
name
orid
attributes, then they would seem to be out of luck:tidy
simply does not seem to offer this.Therefore, I ask that
--anchor-as-name
be changed from taking a Boolean argument to taking either an Autobool argument or an enum argument, to allow the user to choose from at least three values to pass as an argument: each of the two existing values (e.g. "yes", and "no"), and a new, no-op value (e.g. "auto", or "preserve").