ECMA-262 specifies the allowed whitespace characters in Table 32. slimit complains that these are invalid characters. The spec says:
ECMAScript implementations must recognize as WhiteSpace code points listed in the “Separator, space” (Zs) category by Unicode 5.1. ECMAScript implementations may also recognize as WhiteSpace additional category Zs code points from subsequent editions of the Unicode Standard.
Here's a small test that exhibits some of the problems. There may be other characters in the Zs unicode category that must also be included, I haven't looked for those here.
import re
from slimit.parser import Parser as sParser
from slimit import ast as sAst
from itertools import product
import unicodedata
def replace_spaces(s, wschar):
yield "WITHOUT REPLACEMENT", s
offsets = [i for i, c in enumerate(s) if c == ' ']
try:
name = unicodedata.name(wschar[0])
except ValueError:
name = repr(wschar)
for i in offsets:
yield "WITH REPLACEMENT OF " + name, s[:i] + wschar + s[i+1:]
jsparser = sParser()
for src, wschar in product(
[u" function_name( 'arg' ) "],
[u"\x09", u"\x0b", u"\x0c",
u"\x20", u"\xa0",
u"\uFEFF"]):
for prefix, js in replace_spaces(src, wschar):
print prefix, "=>", js
try:
tree = jsparser.parse(js)
except SyntaxError as e:
print "Syntax error", e
print
ECMA-262 specifies the allowed whitespace characters in Table 32. slimit complains that these are invalid characters. The spec says:
Here's a small test that exhibits some of the problems. There may be other characters in the
Zs
unicode category that must also be included, I haven't looked for those here.