gorilla / css

Package gorilla/css is a CSS3 tokenizer.
https://gorilla.github.io
BSD 3-Clause "New" or "Revised" License
87 stars 36 forks source link

Tokenizer works incorrectly when parsing CSS without line breaks #2

Open myearwood opened 10 years ago

myearwood commented 10 years ago

The tokenizer seems to work for pretty printed CSS, but it has trouble picking up closing brackets for CSS without line breaks. Because it does not identify the line breaks, it produces a string token with lots of CSS insider of it. All that CSS is inaccessible.

Implications of this bug

Much of the CSS found in the wild has removed line breaks to save space. This bug prevents much of the CSS found on the web from being parsed properly.

Steps to reproduce

the following valid CSS parses correctly when pretty printed, and incorrectly when it is not pretty printed. Inspect the tokens produced for the pretty printed version and the non- pretty printed version.

#sw_tfbb,#id_d{display:none}.sw_pref{border-style:solid;border-width:7px 0 7px 10px;vertical-align:bottom}#b_tween{margin-top:-28px}#b_tween>span{line-height:30px}#b_tween .ftrH{line-height:30px;height:30px}input{font:inherit;font-size:100%}.b_searchboxForm{font:18px/normal 'Segoe UI',Arial,Helvetica,Sans-Serif}.b_beta{font:11px/normal Arial,Helvetica,Sans-Serif}.b_scopebar,.id_button{line-height:30px}.sa_ec{font:13px Arial,Helvetica,Sans-Serif}#sa_ul .sa_hd{font-size:11px;line-height:16px}#sw_as strong{font-family:'Segoe UI Semibold',Arial,Helvetica,Sans-Serif}#id_h{background-color:transparent!important;position:relativ e!important;float:right;height:35px!important;width:280px!important}.sw_pref{margin:0 15px 3px 0}#id_d{left:auto;right:26px;top:35px!important}.id_avatar{vertical-align:middle;margin:10px 0 10px 10px}
myearwood commented 10 years ago

Workaround for this bug

to avoid this bug, add a line break after every closing bracket before passing the CSS into the parser. This can be done using the strings library.

import "strings"

myCSS = strings.Replace(myCSS,";",";\n",-1)
myCSS = strings.Replace(myCSS,"}","}\n",-1)

//pass myCSS into the parser