mankyd / htmlmin

A configurable HTML Minifier with safety features
https://htmlmin.readthedocs.org/en/latest/
Other
129 stars 40 forks source link

wierd issue with Minifier.minify #46

Closed runar-rkmedia closed 6 years ago

runar-rkmedia commented 7 years ago

I ran into an odd issue here. Earlier, I used just the minify(html)-function, and it worked great. Lately, I've changed to Minifier.minify, and ran into problems. On pages involving OAuth from Google, I get OpenTagNotFoundError-errors from the parser. However, I tried removing almost all of the code that I sent to minification, but it still outputted an error. Switching back to only using minify(html) solves it.

I have tried to debug this, but I am unable to figure it out. I tried sending only very basic html, of which I am certain is correct, and all tags are opened and closed in the correct order.

I see in the code that minify does not call anything in the parser, which Minifier.minify does. Is there a reason why there is a difference here?

mankyd commented 7 years ago

Odd. minify simply creates a Minifier and feeds it a single bunch of input. Minifier.minify does essentially the same thing, but first resets the internal state of the parser before running.

Can you get a copy of the HTML that causes the problem? Preferably trim it down to the minimum amount of content that still causes the problem to occur.

Markus00000 commented 7 years ago

What I noticed so far:

1) Checking the HTML did not reveal missing tags.

2) One site is using htmlmin directly, another site is using it via Flask-HTMLmin. Both create a Minifier() and make use of its minify() function but only the Flask site produces this issue.

3) I tried feeding the identical HTML into minify() and it did not reliably reproduce the error. It seems to be related to signing in and out. I am not using OAuth.

4) Signing in and out produces redirects, therefore I introduced a redirect into a simple route of the Flask app and it reliably reproduces the error.

I am trying to build a minimal example, but so far I have not been able to reproduce it.

Looking at the affected Flask app, I thought this would produce the error, but it does not:

from flask import Flask
from flask import redirect
from flask import url_for
from flask_htmlmin import HTMLMIN

app = Flask(__name__)
app.config['MINIFY_PAGE'] = True
HTMLMIN(app)

@app.route('/')
def hello_world():
    return '<p>    <a href="{}">Hello, World!</a>    </p>'.format(
        url_for('failure'))

@app.route('/failure/')
def failure():
    return redirect(url_for('hello_world'))
runar-rkmedia commented 7 years ago

I also got the problem when using Flask-HTMLmin, have not tried other libraries, or using this library directly. I will have a go at it when I get some time.

I tried printing the html to console, and add the tag as part of the message while raising OpenTagNotFoundError. Very strange result. The bad tag seems to be a p-tag, but I have no p-tags in the html at all. I tried with an almost empty html-file, only <html><body></body></html> and got no errors. I added a single div, like this <html><body><div></div></body></html> and the error was consistently back.

mankyd commented 7 years ago

I am unable to repro this problem with htmlmin directly:

import htmlmin
mini = htmlmin.Minifier()
mini.minify('<html><body><div></div></body></html>')
# u'<html><body><div></div></body></html>'
mini.minify('<html><body><div></div></body></html>')
# u'<html><body><div></div></body></html>'
mini.minify('<html><body><div></div></body></html>')
# u'<html><body><div></div></body></html>'

I am happy to work on this if you can get the problem repro'ed, but perhaps the problem is with Flask-HTMLMin?

citijk commented 6 years ago

The problem has nothing do with Flask-HTMLMin test.py:

from htmlmin import Minifier

default_options = {
            'remove_comments': True,
            'reduce_empty_attributes': True,
            'remove_optional_attribute_quotes': False
        }
html_minify = Minifier(
            **default_options)

html1="""
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href="http://localhost:8000/page/">http://localhost:8000/page/</a>. If not click the link.
"""

html2="""
<!DOCTYPE html>
<html>
    <head>
        <title>test...</title>
    </head>
    <body>
        <p>test</p>
    </body>
</html>
"""

html_minify.minify(html1)
html_minify.minify(html2) # take htmlmin.parser.OpenTagNotFoundError

html1 is returned when redirected: https://github.com/pallets/werkzeug/blob/4397e61daf66ad43bd5668741c5d876de116a71f/werkzeug/utils.py#L373

html2 landing page code

mankyd commented 6 years ago

Thanks for a repro case! I will look at this soon.

Markus00000 commented 6 years ago

@mankyd Thanks for fixing this issue! How about releasing a new version on PyPI?

mankyd commented 6 years ago

Done