scrapinghub / js2xml

Convert Javascript code to an XML document
MIT License
186 stars 23 forks source link

Exception parsing object with `class` as key #31

Closed andrewbaxter closed 4 years ago

andrewbaxter commented 6 years ago
import js2xml
js2xml.parse('var x = {class: 4};')

outputs:

Traceback (most recent call last):
  File "a.py", line 2, in <module>
    js2xml.parse('var x = {class: 4};')
  File "/.../lib/python3.6/site-packages/js2xml/__init__.py", line 17, in parse
    tree = _parser.parse(text, debug=debug)
  File "/.../lib/python3.6/site-packages/js2xml/parser.py", line 36, in parse
    result = super(CustomParser, self).parse(text, debug=debug)
  File "/.../lib/python3.6/site-packages/slimit/parser.py", line 93, in parse
    return self.parser.parse(text, lexer=self.lexer, debug=debug)
  File "/.../lib/python3.6/site-packages/ply/yacc.py", line 331, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
  File "/.../lib/python3.6/site-packages/ply/yacc.py", line 1199, in parseopt_notrack
    tok = call_errorfunc(self.errorfunc, errtoken, self)
  File "/.../lib/python3.6/site-packages/ply/yacc.py", line 193, in call_errorfunc
    r = errorfunc(token)
  File "/.../lib/python3.6/site-packages/slimit/parser.py", line 116, in p_error
    self._raise_syntax_error(token)
  File "/.../lib/python3.6/site-packages/slimit/parser.py", line 89, in _raise_syntax_error
    self.lexer.prev_token, self.lexer.token())
SyntaxError: Unexpected token (CLASS, 'class') at 1:9 between LexToken(LBRACE,'{',1,8) and LexToken(COLON,':',1,14)
metatoaster commented 6 years ago

This is related to the widly reported issue in slimit (first in rspivak/slimit#52, then rspivak/slimit#59, rspivak/slimit#81, rspivak/slimit#90) where slimit fails to parse reserved keywords as bare keys occurring inside objects and as attributes. This was fixed in calmjs.parse

>>> from calmjs.parse import es5
>>> es5('var x = {class: 4};')
<ES5Program @1:1 ?children=[
  <VarStatement @1:1 ?children=[
    <VarDecl @1:5 identifier=<Identifier ...>, initializer=<Object ...>>
  ]>
]>
Gallaecio commented 5 years ago

@redapple Would moving to calmjs.parse be the right move here?

redapple commented 5 years ago

@Gallaecio , I believe so. If someone wants to give a shot at it. I don't know how easy the XmlVisitor is to rewrite when using calmjs.parse though.

metatoaster commented 5 years ago

The "visitor" pattern can still be used, however certain class names for the nodes have changed (e.g. Program -> ES5Program), or that new nodes have been introduced. The best way to catch this is to have the generic_visit method in the xmlvisitor class to raise Exception(repr(node)) instead of returning a string and this will show the node that couldn't be handled.