mankyd / htmlmin

A configurable HTML Minifier with safety features
https://htmlmin.readthedocs.org/en/latest/
Other
129 stars 40 forks source link

Control handling of escaped ampersands in URLs #37

Closed s-m-e closed 7 years ago

s-m-e commented 8 years ago

Simple example:

import htmlmin
def web_compress_html(content):
     return htmlmin.minify(
             content,
             remove_comments = True,
             remove_empty_space = True,
             remove_all_empty_space = False,
             reduce_empty_attributes = True,
             reduce_boolean_attributes = False,
             remove_optional_attribute_quotes = False,
             keep_pre = True,
             pre_tags = ('pre', 'textarea', 'nomin'),
             pre_attr = 'pre'
             )

in[0]: web_compress_html('<!DOCTYPE html>\n<html><head><title></title></head><body><iframe src="foo.com/bar?a=1&amp;b=2"></iframe></body></html>')
out[0]: '<!DOCTYPE html>\n<html><head><title></title></head><body><iframe src="foo.com/bar?a=1&b=2"></iframe></body></html>'
in[1]: web_compress_html('<!DOCTYPE html>\n<html><head><title></title></head><body><nomin><iframe src="foo.com/bar?a=1&amp;b=2"></iframe></nomin></body></html>')
out[1]: '<!DOCTYPE html>\n<html><head><title></title></head><body><nomin><iframe src="foo.com/bar?a=1&b=2"></iframe></nomin></body></html>'

Is there a way to control whether (or not) ampersands in URLs like in the above example are escaped? Thanks to w3c's validator, which is throwing tons of errors at me (error: “&” did not start a character reference), I'd like to have a switch for this behaviour. However, no matter what I do, htmlmin will turn &amp; into &, even within (custom) pre-tags.

(Just googled through the standards and a number of related articles. Apparently, there is not consensus on how to use ampersands in URLs - not even in the HTML5 standardization community ... and I really do not want to discuss this nonsense. This is about best practices and reducing errors while debugging a website. Nevertheless, thanks a lot for this excellent tool.)

mankyd commented 7 years ago

You can now prefix attributes with pre- in order to avoid changing them.