htacg / tidy-html5

The granddaddy of HTML tools, with support for modern standards
http://www.html-tidy.org
2.72k stars 419 forks source link

Tidy moves HTML comments to previous line #808

Open pdurbin opened 5 years ago

pdurbin commented 5 years ago

Hi! I noticed that Tidy moves HTML comments to the previous line. Is there a way to prevent this? I'm using Tidy 5.2.0, packaged with the latest Ubuntu LTS, 18.04. Here's a "before" and "after" to show how the comment "<!-- for mobile -->" is moved by Tidy:

Before (HTML comment above the line the comment is about)

<!DOCTYPE html>
<html dir="ltr" lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<meta charset="UTF-8">
<!-- for mobile -->
<meta name="viewport" content=
"width=device-width, initial-scale=1">
<title>HTML Tidy</title>

After (HTML comment moved to end of previous line)

<!DOCTYPE html>
<html dir="ltr" lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<meta charset="UTF-8"><!-- for mobile -->
<meta name="viewport" content=
"width=device-width, initial-scale=1">
<title>HTML Tidy</title>
<meta name="keywords" content="tidy, html tidy, html, htacg">
geoffmcl commented 5 years ago

@pdurbin thanks for the comment...

While it is not strictly correct to say the comment "<!-- for mobile -->" is moved by Tidy, I sort of understand what you mean... a newline in the input, is not in the output... but...

While you may have just noticed it - welcome to tidy - this output has been like that since the earliest release, Raggett's 4th August 2000, ie tidy-2000 - nearly 20 years ago...

So I would suggest, this is a - well established feature of tidy - TM like - continued in next, 5.7.22, so...

What is the use case for trying to bring back this meaningless, to html, newline? Is it important?

Is the case strong enough to create a new Pretty Print output option? Called what? To do what, exactly? Full specs...

At present this looks like Won't Fix... but look forward to further feedback, comments, etc, etc, - thanks

pdurbin commented 5 years ago

@geoffmcl hi! Thanks for your thoughtful reply.

Yes, comments are meaningless to browsers but I had people in mind. Over at https://github.com/IQSS/metrics.dataverse.org/pull/5 I added the first HTML page to a project and suggested in CONTRIBUTING.md that we could use tidy to format our code. (I provide a config file.) However, I can anticipate contributors asking me, "Why does tidy move my comments around?" So here I am, asking why. To be clear, I'm also fine with any sane workaround. My workaround for now has been to delete all my HTML comments but I think this is sub-optimal. :smile:

For what it's worth, while I was researching a solution, I found this question, which seems to be related: https://stackoverflow.com/questions/537112/html-tidy-dont-move-those-comments

I hope this helps. I'm not sure if I've answered all of your questions but I'm happy to ramble on. Please let me know what you think. Thanks!

geoffmcl commented 5 years ago

@pdurbin thank you for the compliment... I do try hard to understand issues presented...

But I do not yet understand why metrics.dataverse.org features in this...

Current libTidy does not move comments around... in general it will output them in the order they occurred in the input... for instance running the sample input in the stackoverflow post, using current tidy, will not yield the results shown...

In general I would reply to anyone that asks "Why does tidy move my comments around?", answer, oops, libTidy does not do this... FULL STOP!

A broader response would be that the purpose of tidy, from tidy, is to ... corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern standards.... a simple promise, but very difficult, in an ever advancing html world... but at a very minimum, you can expect some relining of your input...

A sort of technical response would be libTidy is like a browser, it inputs the stream to a tree, only keeping important nodes, and text, attached to those nodes, discarding others, non-attached, spacey only, stuff... and outputs that tree to an output file... that means things like -

<meta charset="UTF-8">
<!-- for mobile -->
<meta name="viewport" content=
"width=device-width, initial-scale=1">

are stored in the libTidy tree, and sent to the output, as -

   StartTag meta   charset="UTF-8"
   Comment  
   StartTag meta   name="viewport" content="width=device-width, initial-scale=1"

A simple answer might be that tidy is not a tool that maintains your input lines... it might in general appear to do so, but that is far from the case...

It inputs your stream, tidies it, and outputs a completely relined product, which we hope you like... there are quite a number of options modifying that output...

Try an extreme example, --vertical-space auto, to see how input newlines can be totally ignored in the output...

I do not ask you to ramble on... sorry...

I go back to my first comment, is this an issue? Can this be closed?

Seek meaningful feedback, comments... thanks...

pdurbin commented 5 years ago

You're right. I tried --vertical-space auto and this is definitely not what I want. 😄 It removes newlines, making the HTML unreadable to people. For the project I'm working on (that "metrics" page, I mentioned), I am editing HTML directly, and I'm shopping around for a command line tool that can be used to maintain consistency of HTML formatting, especially as other developers jump in.

My thought was, "I'll suggest to other developers that they can use Tidy with a config file I provide them with rules about how many spaces to indent, etc."

But you are saying that Tidy is not a tool for maintaining input files. This makes me a little sad. 😞

Perhaps a better tool for my use case would be html-beautify, which I read about at https://github.com/beautify-web/js-beautify#css--html . I haven't tried it yet because I thought I'd give Tidy a try first. I've known about Tidy for years and this seemed like a good opportunity to use it.

It sounds like you object to me saying that Tidy moves my comments around. I'm just trying to imagine how other developers would react to me suggesting that they try Tidy for the project I mentioned where we are editing HTML directly. They would probably say something like, "Why does Tidy move my comment to the end of the previous line? My comment is about the following line, not the previous line." Can we agree that Tidy moves comments to the very end of the previous line, removing all whitespace? I'm not how to accurately describe what Tidy does, what you say it has been doing for 20 years. From my perspective it's as if Tidy is saying, "Comments will be associated with the previous line rather than the following line." To me this is backward. I was hoping Tidy would have a flag like --comments-for-following-line to override what is, to me, surprising behavior. That is to say, I write comments like this:

<!-- for mobile -->
<meta name="viewport" content="width=device-width, initial-scale=1">

I don't write comments like this:

<meta name="viewport" content="width=device-width, initial-scale=1"><!-- for mobile -->

Maybe some people do. 😄 It's a free country. 😄

I really appreciate you reading all this! Tidy seems great and again, my workaround is simply to not include any comments in HTML. To me, this is not a great solution, though, which is why I opened this issue.

geoffmcl commented 5 years ago

@pdurbin have you checked comments, other than in the <head>? Maybe you'll be in for another surprise...

Don't object to you saying that Tidy moves my comments around... as you say, it is a free country... ;=))

I took pains to try to explain that technically that is not what happens... but I seem to have failed... oh, well...

Not sure I understand the simple statement ... Tidy is not a tool for maintaining input files..., which you imply I suggested... This makes me sad...

I quoted from the aims of tidy... see http://www.html-tidy.org/ ...

When I was very active in web content creation, I used it on nearly every one of some 2,500 files... I hope others still do today... because they see some benefit...

But does it maintain input files? Well, sort of, no! Or yes... depends what you mean...

It generally tidy ignores most space in the input, except where such space is significant... like <pre>, <script>, etc, etc... It generates, hopefully, publishable, fixed, valid, ... output files, with a consistent relining, and spacing of the results... there are some options that influence this...

Certainly, if you are not happy with the final results, then maybe tidy is not what you need, want, ...

Or you can advocate for a new option, like --comments-for-following-line... need a full spec for this... <head>, <body>, cases, docs... etc... give a use case...

I do not think the one sample/output shown is sufficient -

<meta charset="UTF-8"><!-- for mobile -->
<meta name="viewport" content="width=device-width, initial-scale=1">

Question: What to do about the following? Same? Or different option(s)...

<body>
<!-- begin header -->
<h1>Header</h1>
<!-- begin content -->
<p>Content</p><!-- end content -->
<!-- begin tail -->
<p>tail</p><!-- end 
tail -->
<!-- other variations -->
</body>

And see how this changes, if say the -i option is added... lots to explore, understand, decide... tidy moves comments is too broad...

Seek feedback, comments, even patches, PRs, etc... thanks...

pdurbin commented 5 years ago

Hmm, I forgot that some people like to put comment like <!-- end content --> all over their HTML markup. I'm not one of those people (again, I put comments above the line I'm talking about) so I might need to think about this some more. Thanks for reminding me of this!

pdurbin commented 5 years ago

I looked around in https://github.com/htacg/tidy-html5-tests a bit and couldn't find any tests specific to comments and newlines. This, to me, seems like a logical place to start, to add or review a test that asserts the current behavior.

geoffmcl commented 5 years ago

@pdurbin thank you for your continued feedback, and investigation, research... into the tidy-html5-tests...

While I too think there are no specific to comments and newlines tests, that I can see, there are some 64 testbase\*.html input files that have comments...

So potentially, any change in the current comment/newline output situation, would probably show up in 1, or more, of these... ie fail a regression test... phase 2 - compare expected... but not sure...

But can not see, understand, this is the logical place to start for this issue???

Somehow I agree, where we put a comment, does matter to the human reader, and libTidy has made choices...

Which may matter to the humans, but not to the valid html output... choices...

As I ask, is there a new option here? Give SPECS, etc, etc...

We have one case in the <head>, different in the <body, and influenced by the indent option...

As previously indicated, tidy moves comments just does not cut it... can not fix that... directly...

As stated, seek further feedback, comments, even patches, PRs, etc... thanks...

fabiomosti commented 3 years ago

hello, +1 for Tidy leaving comments in place! please, please, please it's so difficult to leave them where they are? just treat them as if they are a tag exactly like others...

davvalent commented 2 years ago

Hello,

In my case it can be a problem when I need to commit a single line with Git after Tidy processed the file, let's say for a hot fix. If I don't want the comment to be commited for now (for whatever the reason) I get an unclosed comment on that line.

The workaround I found:

So you can play with your code and the wrap value in order to achieve that (if the wrap value is an option for you). Of course it won't be possible in all cases.

I'm using Tidy 5.6.0.

https://api.html-tidy.org/tidy/quickref_5.6.0.html#wrap

fabiomosti commented 7 months ago

wow IT'S SO MANY TIME since I commented this thread... and nothing happened... it's so sad a good project like this does not consider user's requests

davvalent commented 7 months ago

I guess you mean it's so sad free softwares don't have the necessary resources to do so...

By the way it's on the 5.9 milestone which has some pre-releases already. It's a slow pace, and it's okay like that.