Closed daohoangson closed 7 years ago
That seems reasonable to me. Can you include a description of the characters as you have done in this PR in the comments in the code? That will give people looking at it in the future some point of reference.
@kisielk sure thing.
Cool. Thanks a lot! 👍
Basic idea: urlchar should accept [(ascii characters minus those that need escaping)(non ascii characters)(escaped sequences)]. The 2 later parts are taken care of by {nonascii} and {escape} macro already. Below is the broken down explanation for the first part:
ASCII characters range =
[\u0020-\u007e]
Skip space \u0020 =[\u0021-\u007e]
Skip quotation mark \0022 =[\u0021\u0023-\u007e]
Skip apostrophe \u0027 =[\u0021\u0023-\u0026\u0028-\u007e]
Skip reverse solidus \u005c =[\u0021\u0023-\u0026\u0028-\u005b\u005d\u007e]
Also, the left square bracket (\u005b) and right (\u005d) needs escaping themselves, hence the final regex