Open Erotemic opened 6 months ago
Can you include the exception?
Also, why not always use parser.parse(str(text))
?
It's a TypeError from the pyx file:
File ~/.pyenv/versions/3.11.9/envs/pyenv3.11.9/lib/python3.11/site-packages/lark/parser_frontends.py:100, in ParsingFrontend.parse(self, text, start, on_error)
98 chosen_start = self._verify_start(start)
99 kw = {} if on_error is None else {'on_error': on_error}
--> 100 stream = self._make_lexer_thread(text)
101 return self.parser.parse(stream, chosen_start, **kw)
File ~/.pyenv/versions/3.11.9/envs/pyenv3.11.9/lib/python3.11/site-packages/lark/parser_frontends.py:95, in ParsingFrontend._make_lexer_thread(self, text)
93 def _make_lexer_thread(self, text: str):
94 cls = (self.options and self.options._plugins.get('LexerThread')) or LexerThread
---> 95 return text if self.skip_lexer else cls.from_text(self.lexer, text)
File lark_cython/lark_cython.pyx:384, in lark_cython.lark_cython.LexerThread.from_text()
TypeError: Argument 'text' has incorrect type (expected str, got SingleQuotedScalarString)
Even though the type I'm passing through is an instance of a class that inherits from str
, it is not a base str
, and I'm thinking Cython doesn't like that. It could just be that this is a Cython issue, I didn't see anything in the pyx file that looked like it was the implementation checking for an exact str.
The reason not to blindly cast the input to str
is because it should raise a TypeError if I give it a bad type. Chances are a bad object would result in a parse error, but there is also a chance it wouldn't, so doing that cast is undesirable.
Even my workaround is undesirable because a class that inherits from str
could still define __str__
and mess things up:
class MyStr(str):
def __str__(self):
return 'foo'
text = MyStr('hello')
assert text == 'hello'
# Undesirable, but pathological
assert str(text) == 'foo'
Although that second case is fairly pathological.
Yes, the text is defined as str (for performance reasons), and I suppose Cython doesn't accept objects that inherit from str, since it wants it the actual type to be a simple wchar[]
(or whatever).
That's my conclusion as well, but I haven't been able to find docs that explicitly state that is the case.
I'm not sure if this needs to be fixed in lark-cython or if it is better left to consumers. The workaround I'm using could be inserted either in lark.parser_frontends
or by adding an additional pure-python layer in lark-cython. It's a fairly niche gotcha, and maybe just having this issue exist is good enough so the workaround is googleable.
Describe the bug
When parsing text from a YAML file in ruamel.YAML it it returns a
SingleQuotedScalarString
object, which does inherit from thestr
type. Sending this string to the pure-python lark parser seems to work fine, but when sending it to the cython variant it throws a TypeError.To Reproduce
The following is a MWE that reproduces the issue:
The type information it prints before it fails is:
I've tested with versions:
And
My thought is that cython would handle a class that inherits from a
str
, but perhaps it doesn't I'm not sure if this can be fixed on the lark-cython side, but I figured it was worth reporting.My current workarond is to do something like this: