thriftrw / thriftrw-python

A Thrift encoding library for Python
MIT License
36 stars 10 forks source link

IDL parser fails on grave accent in comments #142

Open srussell-uber opened 7 years ago

srussell-uber commented 7 years ago

We recently observed a failure after updating our IDL...

    module = thriftrw.load(path=path, name=module_name)
  File \"/usr/local/lib/python2.7/dist-packages/thriftrw/loader.py\", line 89, in load
    return self.compiler.compile(name, document, path).link().surface
  File \"/usr/local/lib/python2.7/dist-packages/thriftrw/compile/compiler.py\", line 268, in compile
    program = self.parser.parse(contents)
  File \"/usr/local/lib/python2.7/dist-packages/thriftrw/idl/parser.py\", line 468, in parse
    return self._parser.parse(input, lexer=self._lexer, **kwargs)
  File \"/usr/local/lib/python2.7/dist-packages/ply/yacc.py\", line 331, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
  File \"/usr/local/lib/python2.7/dist-packages/ply/yacc.py\", line 1061, in parseopt_notrack
    lookahead = get_token()     # Get the next token
  File \"/usr/local/lib/python2.7/dist-packages/thriftrw/idl/lexer.py\", line 184, in token
    return self._lexer.token()
  File \"/usr/local/lib/python2.7/dist-packages/ply/lex.py\", line 350, in token
    newtok = func(tok)
  File \"/usr/local/lib/python2.7/dist-packages/thriftrw/idl/lexer.py\", line 100, in t_ignore_DOCTEXT
    t.lexer.lineno += t.value.count('\
')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 35: ordinal not in range(128)

The issue was caused by a backtic (grave accent) inside a comment block of our IDL.

This version caused the exception

/** status describes query response`s status */
2: required BacktestStatus status

While this workaround-fixed version did not

/** status describes query response's status */
2: required BacktestStatus status

This seems like a bug. The go-version of the thriftrw loader did not crash on the same IDL.

blampe commented 6 years ago

Just saw another case of this :)