kangax / html-minifier

Javascript-based HTML compressor/minifier (with Node.js support)
http://kangax.github.io/html-minifier/
MIT License
4.93k stars 571 forks source link

Support for custom syntax in ignored tags #83

Closed mhulse closed 8 years ago

mhulse commented 11 years ago

Based on the code introduced in this PR:

New feature: Ignoring tags — Specifically, ignoring strings that start with <? or <% and end with ?> or %>.

... @kangax suggests:

Now we just need to figure out how to support custom syntax in ignored tags.

mhulse commented 11 years ago

@kangax, to clarify this issue, does that mean that there should be the option for people to specify their own start/end tags for the ignored pattern match?

One situation I can think of might be opening/closing tags for something like Django templates:

{% block foo %}html  stuff goes here{% endblock foo %}
{# this is a single line server-side comment #}
{{ variable }}

... where the pattern to match/ignore would be based on {%, {# or {{ (for opening tags) and %}, #} or }} (for closing tags)?

In other words, should the developer be allowed to pass in a regex/pattern to the code for ignoring tags of their choice? If so, should this custom regex/pattern override the existing "ignore tags" regex or should it get appended-to (i.e. on top of matching <?, <%, ?>, and %>, it would also match these custom opening/closing tags: {%, {#, {{, %}, #} and }})?

kangax commented 11 years ago

In other words, should the developer be allowed to pass in a regex/pattern to the code for ignoring tags of their choice?

Yes, I think that would make sense.

If so, should this custom regex/pattern override the existing "ignore tags" regex or should it get appended-to

I think most definitely replaced. If I'm working with ruby, I don't care to do anything about "<?", "?>" strings. I would just want to say something like:

ignoreStartTag: /<\%+/,
ignoreEndTag: /\%>/

and only get those stripped.

The default ones we introduced are just a convenience for 2 most popular languages, and/or until we introduce custom syntax registering.

Does this sound reasonable?

mhulse commented 11 years ago

Does this sound reasonable?

Definitely! :smile:

I don't think it would be hard to introduce as a feature either; it should just be as simple as adding two new options and then using those options to search for start/end tags instead of the hard-coded regex variables.

Like you say, the defaults for the options would be regex pattern that searches for <?, <% and ?>, %>).

If you're open to the help, I could probably tackle this feature this weekend. :octocat:

kangax commented 11 years ago

This is not urgent by any means, I think, so definitely take your time. Thanks for all the help so far!

mhulse commented 11 years ago

Sounds good! Thanks Juriy! :+1:

Lalem001 commented 10 years ago

This feature would be very helpful in my latest project. :+1:

mhulse commented 10 years ago

I think I could do the PR this weekend. Fortunately, it should be pretty easy/fast to do. :octocat:

mhulse commented 10 years ago

@Luis-TP Based on the tags you need to ignore, could you provide an open/close tag example regex for me to test with?

I think I have a working patch, I just need write a few more tests at this point.

mhulse commented 10 years ago

@kangax, do you have any preference in terms of the implementation of registration of custom syntax regexes?

Here's a few ideas ... First, the simplest approach:

ignoreStartTag: typeof options.ignoreStartTag !== 'undefined' ? options.ignoreStartTag : /<(%|\?)/,
ignoreEndTag: typeof options.ignoreEndTag !== 'undefined' ? options.ignoreEndTag : /(%|\?)>/,
ignore: function(text, startTag, endTag) {
  buffer.push(options.removeIgnored ? '' : startTag + text + endTag);
},

...or, this approach might allow for better future growth:

ignore: function(text, startTag, endTag) {
  buffer.push(options.removeIgnored ? '' : startTag + text + endTag);
},
register: {
  ignore: {
    startTag: typeof options.ignoreStartTag !== 'undefined' ? options.ignoreStartTag : /<(%|\?)/,
    endTag: typeof options.ignoreEndTag !== 'undefined' ? options.ignoreEndTag : /(%|\?)>/
  }
}

... or, this feels better to me, but it would break backwards compatability:

ignore: {
  startTag: typeof options.ignoreStartTag !== 'undefined' ? options.ignoreStartTag : /<(%|\?)/,
  endTag: typeof options.ignoreEndTag !== 'undefined' ? options.ignoreEndTag : /(%|\?)>/,
  callback: function(text, startTag, endTag) {
    buffer.push(options.removeIgnored ? '' : startTag + text + endTag);
  }
},

It's late, so maybe I'm over-thinking this?

Any thoughts/preferences on implementation here?

Thanks!

Lalem001 commented 10 years ago

I am working with handlebars. Open and close tags are {{ ... }} and {{{ ... }}}.

Simplest regex pattern I can think of, that matches both cases, is:

ignoreStartTag: /\{{2,3}/,
ignoreEndTag: /\}{2,3}/

From what I can tell, a closing tag of 3 characters might present a problem.

mhulse commented 10 years ago

@Luis-TP Thanks for example! I'll build it into my tests. :+1:

kangax commented 10 years ago

@mhulse I'm thinking something like this as the minifier API:

minify(input, { ignoreStartTag: '...', ignoreEndTag: '...', ... });

and then we'd pass those "tags" to parser (for use instead of startIgnore & endIgnore). Right?

mhulse commented 10 years ago

@kangax that looks good to me. Thanks for feedback.

I'll send a PR today so we can discuss the details further (if need be). :+1:

kangax commented 10 years ago

@mhulse awesome, thanks Micky!

mhulse commented 10 years ago

Well, I thought this would be easy, but with anything programming related, I have been proven wrong. :laughing:

I was able to get a passing test for @kangax's suggestion of /<\%+/, but when I tried matching /\{{2,3}/, my code would never get a chance to "see" those spots in the HTML string.

Here's where I left off:

https://github.com/mhulse/html-minifier/commit/d470390e7f8892207738bf524670170bfed48154

The problem:

Because opening tags like {{ {{{ are not flagged as HTML (somewhere along the html-minifier's code path ... I have yet to figure this out due to complexity of code) the aforementioned characters get treated as normal text (not HTML) and therefore my code (i.e., else if (html.search(handler.ignoredStartTag) === 0) { ...) never has a chance to parse things that don't start with an angle brace < (in other words, non-< elements are treated as chars and the other parts of the script take over).

I think I need to take a break and come back at this with a fresh outlook (my Sunday is almost over anyway).

In the meantime, does anyone else have any tips on how best to handle this scenario? I think I could use a fresh pair of eyeballs to look at my logic. :+1:

Lalem001 commented 10 years ago

Apologies, I should have clarified that the issues I am having are occurring when using handlebars for dynamic tag attributes. For instance:

<button {{#if disabled}}disabled{{/if}}>...</button>

As you can see even github's syntax highlighting marks it as invalid html.

If the {{...}} occurs within the attributes quoted value, its not a problem:

<button class="{{#if primary}}primary{{/if}}">...</button>

Furthermore if the {{...}} occurs within plain text it is usually not a problem either:

<div>{{ example }}</div>

Perhaps I jumped the gun in thinking that custom syntax could resolve this issue? Would this need to be handled a different way?

mhulse commented 10 years ago

@Luis-TP Thanks for clarification. I need to take some time to think about your situation, so I don't have anything helpful to say at the moment.

Running your first example through the online html-minifier, I see this error:

Parse Error: <button {{#if disabled}}disabled{{/if}}>...</button>

So, your code doesn't even make it through the parser. I see now.

Perhaps I jumped the gun in thinking that custom syntax could resolve this issue? Would this need to be handled a different way?

Hrmm, that is a good question. I have not used handlebars (yet) myself, so a piece of the puzzle is missing for me.

As it stands currently, my code would need to be moved/refactored in order to account for custom syntax like you posted.

But now that I know more about what you're trying to do, I'll take another look at the html-minifier's code ... We might be able to handle this in another location, separate from "ignored" tag checking.

With that said, my new code does now allow for someone to pass custom regex into the custom tag checking bits. On one hand, it won't handle {{ or }} template tags, but it will handle tags starting/ending with < and >.

@kangax If we're thinking purely in terms of small, iterative, "improvements" to the code, then I could finish what I'm working on and do a PR. From there, we could open a new issue and continue our talk on how to best handle template tags, like what @Luis-TP posted. How does that sound?

Of course, I don't mind going back to drawing board if we want to try and account for any/all "ignored" type of tags (including template tags like handlebars) in one set of "ignoreXXXXXTag" options.

kangax commented 10 years ago

If we're thinking purely in terms of small, iterative, "improvements" to the code, then I could finish what I'm working on and do a PR. From there, we could open a new issue and continue our talk on how to best handle template tags, like what @Luis-TP posted. How does that sound?

I suppose that sounds good. I'm thinking that perhaps string-based tags could be a nice option for now. For example:

ignoreStartTag: '<%=',
ignoreEndTag: '%>'

or even something like:

ignoreTags: [[ '<%=', '%>' ], ['<%', '%>']]

or:

ignoreTags: ['<%=', '%>', '<%', '%>']
Thomasdezeeuw commented 10 years ago

Any progress on this? I could really use this.

mhulse commented 10 years ago

@Thomasdezeeuw I've been meaning to re-visit at some point. I've been wrapped up in another project since I last left off here. #98 is related, and I'm going to try to carve out a weekend to do a PR on both.

In the meantime, if you could provide example code (or describe your needs in detail) for your particular situation, that would probably be helpful. The more examples I can use to test against, the better. :+1:

broox commented 10 years ago

Has there been any progress on this? I was just trying to minify some mustache templates that may contain markup like...

<div class="name">{{name}}</div>
{{#address}}
    {{>addressTemplate}}
{{/address}}
{{^address}}
    {{>addressFormTemplate}}
{{/address}}

but minify just seems to just hang with that input.

mhulse commented 10 years ago

Last time I left off was here: https://github.com/kangax/html-minifier/issues/83#issuecomment-28165671

I haven't used the htmlminifier much since then, so my head is out of the code. I'd love to give back to this project, but my free time has been limited lately. :frowning:

Anyone currently available to tackle this issue?

Based on work I've done above, and @kangax's comments here https://github.com/kangax/html-minifier/issues/83#issuecomment-28647611, seems like we could be close to having a working patch.

kangax commented 10 years ago

@broox doesn't hang for me. I get output like this:

<div class=name>{{name}}</div>{{#address}} {{>addressTemplate}} {{/address}} {{^address}} {{>addressFormTemplate}} {{/address}}

Try in http://kangax.github.io/html-minifier/

broox commented 10 years ago

@kangax I forgot to follow up on this, but later realized that html was silently failing on one of the included templates.

KubaBest commented 9 years ago

Hi, what is the status of support for custom tags like "{{" and "{%"?