gorilla / css

Package gorilla/css is a CSS3 tokenizer.
https://gorilla.github.io
BSD 3-Clause "New" or "Revised" License
86 stars 36 forks source link

Update urlchar to handle character escaping. #9

Closed daohoangson closed 7 years ago

daohoangson commented 7 years ago

Basic idea: urlchar should accept [(ascii characters minus those that need escaping)(non ascii characters)(escaped sequences)]. The 2 later parts are taken care of by {nonascii} and {escape} macro already. Below is the broken down explanation for the first part:

ASCII characters range = [\u0020-\u007e] Skip space \u0020 = [\u0021-\u007e] Skip quotation mark \0022 = [\u0021\u0023-\u007e] Skip apostrophe \u0027 = [\u0021\u0023-\u0026\u0028-\u007e] Skip reverse solidus \u005c = [\u0021\u0023-\u0026\u0028-\u005b\u005d\u007e] Also, the left square bracket (\u005b) and right (\u005d) needs escaping themselves, hence the final regex

kisielk commented 7 years ago

That seems reasonable to me. Can you include a description of the characters as you have done in this PR in the comments in the code? That will give people looking at it in the future some point of reference.

daohoangson commented 7 years ago

@kisielk sure thing.

kisielk commented 7 years ago

Cool. Thanks a lot! 👍