serg472 / htmlcompressor

HTML Compressor and Minifier, can be used standalone and as a Java library
Apache License 2.0
40 stars 14 forks source link

How to use block preservation rules in Ant build or from Command Line #21

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
How can we use block preservation rules in when calling HTMLCompressor from an 
Ant build OR from a windows batch file?

I need to add a block preservation rule but I can not work out if it is 
possible.

Please help.

Thanks,
Mark.

By the way HTMLCompressor is a good piece of work. Well Done.

Original issue reported on code.google.com by MrMarkWe...@googlemail.com on 6 Dec 2010 at 7:53

GoogleCodeExporter commented 9 years ago
Currently not possible. I don't know a clean solution to this. Maybe put 
regexps into a separated file (one per line) and then pass path to this file 
through command line. Passing list of regexps through command line is not an 
option I think.

Original comment by serg472@gmail.com on 6 Dec 2010 at 9:46

GoogleCodeExporter commented 9 years ago
Hello Serg,

METHOD 1:

Would it be possible to have some optional pre-set block preservation rules 
built in to the Htmlcompressor script?

I personally am using Asp Dot Net and I want to compress some files with both 
HTML and in-line Asp Dot Net code.

The tags I would want to preserve are the in-line script tags that look like 
this:
<%  [some inline code]  %>
I have just checked and it looks like JSP uses the same tags for in-line code 
blocks.

So the optional built in rule would preserve tags starting with <% and ending 
with %>

You could add a optional parameter called -preserveInlineServerTags that when 
passed via the dos command line would tell the HTMLcompressor script to use the 
built in reg expression.

You could also build in other optional parameters for the other popular tags 
like the <jsp: ... >,  <asp: ... > and <php ... > tags and give them different 
dos parameters like -preserveJSPTags and -preserveASPTags.

I think this addition would allow many more people to use the script in more 
situations. 
Even people who are using the script in a Java Project could use the built in 
block preservation rules making setting up the script a bit faster. 

METHOD 2:

The other option of just having a separate file and making the Htmlcompressor 
get a list of reg expressions would be very useful and allow the script to be 
used in even more situations and by more people.

A separate file would also be cool because you could host a master list that 
would get better and better over time as developers add to/refine it.

Conclusion:

METHOD 1 makes the script more usable (Easy to use Basic Rules). 
METHOD 2 makes the script more customizable (Harder to use Advanced Rules). 

Implement both solutions for maximum usability and maximum customization.

What do you think Serg?

Original comment by MrMarkWe...@googlemail.com on 7 Dec 2010 at 4:07

GoogleCodeExporter commented 9 years ago
Sounds good, but hardcoding all those rules might be challenging, who knows 
what people are compressing. For example php tags could be <?php ?> and could 
be <? ?>, jsp tags could be <% %> and could be <jsp: > (or any tag really, if 
you include taglibs), and I don't know anything about asp. 

What rules do you need personally?

Original comment by serg472@gmail.com on 7 Dec 2010 at 5:21

GoogleCodeExporter commented 9 years ago
Nevermind, looks like I didn't read your post carefully - you need it for ASP :)

So what kind of tags does ASP have besides <% %>? Does it have <asp: >? Do you 
need to preserve those too?

Original comment by serg472@gmail.com on 7 Dec 2010 at 5:31

GoogleCodeExporter commented 9 years ago
Hi,

Yes, I do want to use for Asp and apart from the in-line code blocks that are 
surrounded by <%    %> 
Asp dot net controls use tags like the following:

<ASP:DropDownList></ASP:DropDownList>

<asp:ListItem></asp:ListItem>

<asp:RequiredFieldValidator></asp:RequiredFieldValidator>

<ASP:Button />

All start with <:asp

Like in HTML, empty tags can end with />. E.g. the button tag above.

Regards,
Mark.

Original comment by MrMarkWe...@googlemail.com on 7 Dec 2010 at 5:37

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Cont.....

I think there should be a minimum of 3 reg expressions for Basic AspDotNet 
block preservation rules.

One that finds all the in-line code blocks like:  <%  %>
One that finds all the tags like:                 <asp:[tag name]>   </asp:[tag 
name]> 
One that find all the tags like:                  <ASP:[tag name] />

Regards,
Mark.

Original comment by MrMarkWe...@googlemail.com on 7 Dec 2010 at 5:45

GoogleCodeExporter commented 9 years ago
Well that's where problems start. Regexps are not up to task for such cases:

<asp:a>
    <asp:b>
        <asp:a>
        ...
        </asp:a>
    </asp:b>
</asp:a>

That's why I let users deal with such cases themselves so at least I am not 
responsible for errors. The truth is that you can't use regexps for such things 
and I added custom patterns only because many were asking for them. Regexps 
might work in some simple cases, but I don't feel comfortable hardcoding them 
(because they simply won't work in general case).

Original comment by serg472@gmail.com on 7 Dec 2010 at 6:45

GoogleCodeExporter commented 9 years ago
Hi,

I think that it may actually be possible to achieve correct matching of nested 
tags like in the example you provided in your previous reply.

I remember I created a regular expression that would match nested tags a couple 
of months ago. I think the solution is to use Named Capture Groups. 

I do not code regular expressions very often so I need to find the code that I 
wrote and do some reading up, then I will try to help with a regular expression 
that will work. 

I have just checked on the web and Named Capture Groups are not supported in 
Java 6 unless you use a add-on like this one: 
http://code.google.com/p/named-regexp/
JDK7 b50 and above DO support Named Capture Croups according to this page: 
http://blogs.sun.com/xuemingshen/entry/named_capturing_group_in_jdk7

I have to go and play five-a-side footie soon and it is evening here so I will 
sign off till tomorrow now.

Regards,
Mark. 

Original comment by MrMarkWe...@googlemail.com on 7 Dec 2010 at 7:18

GoogleCodeExporter commented 9 years ago
As far as I understand named groups won't help much, it won't bring anything 
new to the table as java already has capturing groups, just they are referred 
by numbers (\1, \2, etc) instead of names.

What is needed is recursive regular expressions, which is pretty exotic beast 
(and very few implementations exist, none in java afaik). 

I think the optimal solution for now would be to hardcode <? ?> and <% %> 
patterns, and then have option to include user custom patterns in a separated 
file for command line version of the compressor.

Original comment by serg472@gmail.com on 7 Dec 2010 at 7:43

GoogleCodeExporter commented 9 years ago
And I just realized that <? ?> could match <?xml ?> tag, so this means I would 
need to go with <?php ?>.

Anyway, if there is no objections then I will go with this solution (will try 
to do it on the weekend).

Original comment by serg472@gmail.com on 7 Dec 2010 at 7:48

GoogleCodeExporter commented 9 years ago
Hello Serg,

Yes, recursive nested tags look too difficult to deal with.

So, in my case just the block preservation rule to preserve the Asp Dot Net 
in-line code would be good.
That is the rule to preserve the tags starting with <% and ending with %>.

Regards,
Mark. 

Original comment by MrMarkWe...@googlemail.com on 8 Dec 2010 at 3:10

GoogleCodeExporter commented 9 years ago
Fixed in 0.9.6 release.

Command line compressor takes 3 new options: `--preserve-php`, 
`--preserve-server-script` and `-p <regexp patterns file>`. 

Original comment by serg472@gmail.com on 11 Dec 2010 at 7:57