haml / haml

HTML Abstraction Markup Language - A Markup Haiku
http://haml.info
MIT License
3.76k stars 571 forks source link

option to compress the generated HTML #406

Closed ghost closed 12 years ago

ghost commented 13 years ago

HAML should have an option to compress the generated HTML, like this:

!!!
%html
  %head
    %title Foo
  %body
    %p Foo

should not become

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
  <head>
    <title>Foo</title>
  </head>
  <body>
    <p>Foo</p>
  </body>
</html>

but

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html><head><title>Foo</title></head><body><p>Foo</p></body></html>
HamptonMakes commented 13 years ago

Ugly mode does something like this

Check out: http://haml-lang.com/docs/yardoc/file.HAML_REFERENCE.html#options

Except, it does indeed include the "\n" character. I do know that we've had discussions about this before, but I don't know why it never got added.

Nathan, what is your take on including the ability to suppress the "\n" character after tags?

joliss commented 12 years ago

Except, it does indeed include the "\n" character. I do know that we've had discussions about this before, but I don't know why it never got added.

There are places where dropping the newlines would change the DOM contents, so I wouldn't recommend doing this. You save a handful of bytes at best but invite a slew of subtle issues. I think with gzip on top of it, ugly Haml output is sufficiently compressed.

vdh commented 12 years ago

What about an option to strip newlines for those of us who understand the "risks" of removing newlines and still would prefer to remove them anyway? Gzip is more important to have set up first, but those newlines are still going to be unnecessary.

but invite a slew of subtle issues

Unexpected newlines can also create subtle issues just as easily. If you design your CSS/JS expecting newlines and they're removed, you get issues. But if you design your CSS/JS expecting not to have excess whitespace, you'll get issues just the same. In my experience, layout issues are almost always due to failures in your own assumptions.

Realistically, any content blocks that need whitespace shouldn't be handled by HAML anyway.

It is suggested that newlines be inserted after the DOCTYPE, after any comments that are before the root element, after the html element's start tag (if it is not omitted), and after any comments that are inside the html element but before the head element.

Newlines after the doctype, before the head, and inside pre tags are the only ones I would want HAML to automatically output in this theoretical "super ugly" mode. Anything else should be manually escaped or passed through by rails/etc for content blocks.

norman commented 12 years ago

HTML minification is hard. Go look at the HTML minifiers out there and you'll see that they all come with warnings about how minification can break your documents.

You may be tempted to think "well, at least you can safely remove whitespace between tags," but even that is tricky. Consider the following code:

%span foo
%span bar

Haml compiles this to:

<span>foo</span>
<span>bar</span>

Which browsers would render as foo bar.

Now, if we make Haml suppress the newline between tags, we end up with the following HTML:

<span>foo</span><span>bar</span>

which browsers would render as foobar. This is very likely not going to be what the author intends.

Now you may be thinking, "ok, well, let's leave newlines after inline elements, and don't add them after block elements."

That breaks down with any CSS that does things like:

div.my-class {
  display: inline;
}

But for the sake of argument, let's assume these problems are solvable. Would it even be worth it to remove the newlines? Let's look at a random, largish HTML document, for example the index page from http://espn.com. The values on the left are the sizes of the files in bytes:

132260 index.html
131689 index-no-newlines.html

(571 bytes saved)

30652 index.html.gz
30172 index-no-newlines.html.gz

(480 bytes saved)

On a 129k document, removing newlines saves us 571 bytes before gzipping, and 480 afterwards.

Given the technical challenges and the diminishing returns, it seems fair to say that it's not worth it.

vdh commented 12 years ago

You've completely ignored the issue with a strawman argument. It's insulting.

Firstly, I already knew about newlines acting as spacing between elements. I wouldn't be looking for a way to remove newlines if I didn't already know these ridiculous "risks" that people keep bringing up over and over like a broken record. I know about that already. I already know that if I'm choosing to strip newlines, that any inline/whitespace CSS styling problems created by this decision will be on my head. We're not talking about defaults that beginners will be tripping up on. We're talking about an optional advanced setting that is chosen only by people who know what they're delving into.

Secondly, I already mentioned that content blocks should be processed outside of HAML. HAML should be for document structure only, not processing inline content where whitespace should be taken into consideration. Please read what I said before copying and pasting an example from Google. Generating HTML externally (via Markdown, for example) is much more suited for important content blocks. The creator of Compass already wrote an article about this 2 years ago...

Thirdly, and most importantly, this is not about minification. "HTML minifiers" are filters applied to already generated content. Filtering out the newlines afterwards is not in any way a good idea, for obvious reasons. HAML is a HTML generator. HAML is the one generating these newlines in the first place, the discussion is about an option to not add these newlines on selected elements during generation.

Fourthly, who are you to decide what is or isn't "worth it" for me? How about mobile connections, huh? Every extra byte saved is "worth it" in that scenario. Not everyone is sitting at their home PC with their ADSL/Cable connection. Even ignoring people with slow or limited connections, why are we going to use an alternate HTML generator if it's generating things that aren't necessary? I always turn :ugly on for the simple reason that if I'm going to inspect the DOM I'll use the developer tools built into Chrome/Safari/Firefox/etc. Indentation in the HTML source is a completely pointless waste of time and bytes.

I chose HAML as a generator so I could avoid manually generating my own HTML, not pretty-print it with tabulation and newlines.

norman commented 12 years ago

@vdh 500 bytes transfers in 0.07 seconds on an EDGE connection. And it's still far less than a second even on GPRS.

You're asking us to implement another output option for Haml, maintain it, document it, and deal with ongoing bug reports - in order to achieve a very small optimization.

If you feel so passionately about this, I would suggest you fork Haml, implement it, put it into production on a website, and if it works well, send us a pull request and show how much the optimization improved things on your website. If you can demonstrate that there's a tangible benefit then I'd be happy to include the feature.

You're asking us to make a change to Haml, so the burden of proof is on you.

vdh commented 12 years ago

The least you could have done was be a sociable human being, tell the truth and say "this feature is too time-intensive to implement", instead of talking down to me like a child with an "intro to HTML" lecture. That would have been the polite thing to do.

Just this morning I was a huge fan of HAML, but your behaviour is insulting enough that I don't think I'll ever touch it again now.

joliss commented 12 years ago

@vdh Norman's response was not only polite, but also really nice, as he took the time to explain his reasoning. I didn't get the feeling he was talking down to you or anybody else.

@norman Thanks for all your hard work!

joliss commented 12 years ago

On the issue at hand, I tried monkey-patching Haml before because newlines were giving me issues with XPath assertions, but it turned out painful in so many ways that I stopped doing this.

vdh commented 12 years ago

@joliss Really? Did he honestly think that I would be researching how to strip newlines and somehow not already know about inline elements?

I explicitly stated upfront that content blocks should be handled completely outside of HAML altogether. Anyone using inline elements as part of their layout structure should already be aware of the pitfalls of removing newlines in this extreme edge case.

It's quite obvious that he didn't actually read what I said, and just wrote a boilerplate "newlines and inline elements" response. There are plenty of valid problems caused because of unwanted newlines, but I won't bother listing them because it would just get ignored like everything else I said.

norman commented 12 years ago

Just for the record, I got some more feedback from other folks on this, and had a bit of a change of heart. @vdh sent a small, very clean patch which implements this feature and I applied it. Personally I'm still a little skeptical as to the value of the feature and probably won't use it myself, but enough people wanted it and were willing to do the work to make it happen, so democracy won out. So if you wanted to try this feature out, it's now in master and slated to go into 3.2 - use it and please give your feedback on it!

vdh commented 12 years ago

Apologies for all that ranting earlier, I wish I had noticed Haml's whitespace removal feature in the docs earlier and avoided that whole argument.

I'm a little embarrassed to admit I've already switched to the Haml-inspired Slim template engine, partly due to the more manual control over whitespace it has. But I followed through with making sure the patch worked to help out with #528 and anyone else who might be interested in activating whitespace removal globally.

norman commented 12 years ago

@vdh no worries at all - thanks for your contribution.

lypanov commented 12 years ago

Just so you know <span>blah</span><span>blah</span> does not seem to render "blah blah" in IE 7. I would very much like a whitespace removal mode rather than having to verify everything in IE.

norman commented 12 years ago

@lypanov I'm not entirely sure I understand what you're asking us to do.

lypanov commented 12 years ago

Nothing. @vdh solved my problem already. I'll switch to slim. Everything works fine there.

norman commented 12 years ago

@lypanov Cool - Slim is awesome, I've used it on some other projects and think it's a great library with a fantastic implementation, even if I personally prefer Haml.

My question however was really that: I didn't understand what you meant. I wouldn't expect any browser to render <span>blah</span><span>blah</span> as blah blah by default, but rather blahblah. Am I mistaken?

lypanov commented 12 years ago

Yeah. AFAICT only IE renders it as such (without the space). The rest render it with Its feasible it's my reset but this release alone we have 7 white space issues with IE and if I'd written the templates with slim in its default mode I would have been forced to write sane code in the first place. I'm a big fan of fail early and fail hard.

BTW my comment was originally intended as a reply to your comment https://github.com/haml/haml/issues/406#issuecomment-5532124. I didn't reopen the issue because the problem has already been solved for me (well, and because I can't :P). I just wanted to clarify what @vdh meant with the subtle issues that you run into when not doing newline removal.

norman commented 12 years ago

Yeah. AFAICT only IE renders it as such (without the space).

If you're seeing that, it's definitely because of your CSS. Webkit, Mozilla, Opera by default render it without space, which is what you should expect because span is an inline element. You can verify this easily by putting the code in a file with no CSS and opening it in any browser (I just did).

lypanov commented 12 years ago

Now I understand your "change" of position. I meant %span %span in HAML, as in, with a newline inbetween. I messed up the code while trying to figure out GitHubs weird markdown rules. Sorry for the confusion!

norman commented 12 years ago

Ah, I see - thanks for the followup. Anyway - glad @vdh's patch will help people deal with this going forward.