gpulost / lepl

Automatically exported from code.google.com/p/lepl
Other
0 stars 0 forks source link

cannot mix strings and tokens #22

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
the code below

-----
from lepl import *

v = Token('[a-z]+') & Token(' +') & String()
v.parse('aaa "aaa"')
-----

gives the error

------
lepl.lexer.support.LexerError: The grammar contains a mix of Tokens and 
non-Token matchers at the top level.  If Tokens are used then non-token 
matchers that consume input must only appear "inside" Tokens.  The non-Token 
matchers include: Any(None); Literal('"'); Lookahead(Literal, True); 
Literal('"'); Literal('"'); Literal('\\').
------

trying to tokenize string fails as well

-------
from lepl import *

v = Token('[a-z]+') & Token(' +') & Token(String())
v.parse('aaa "aaa"')
-------

as the code above gives

-------
lepl.lexer.support.LexerError: A Token was specified with a matcher, but the 
matcher could not be converted to a regular expression: And(NfaRegexp, 
Transform, NfaRegexp)
--------

Original issue reported on code.google.com by wrob...@gmail.com on 26 Dec 2011 at 6:10

GoogleCodeExporter commented 9 years ago
hi.  the main issue here isn't a bug - it's how lepl works.  you can either 
work in tokens, or not, but not both.  that's what the error message says.

trying to tokenize String fails because String is too complex for lepl to 
tokenize automatically.  it might be possible to write String so that it can be 
converted automatically, and i will add that to the list of things to do, but 
meantime you can simply define your own regular expression:

  myString = Regexp("'[^']*'")

or similar.

andrew

Original comment by acooke....@gmail.com on 1 Jan 2012 at 10:55

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Well, I have defined my own string indeed

    string = Token('"[^"]+"') | Token("'[^']+'")

However I have no idea how to allow quoting of apostrophe or quotation
characters with backslash, i.e. "test\"string" or 'test\'string'.

Using Python regular expressions that would be

     r""""(([^"]|\")+)"|'(([^']|\')+)'"""

w

Original comment by wrob...@gmail.com on 1 Jan 2012 at 11:48

GoogleCodeExporter commented 9 years ago
the syntax for regexps should be the same as python, except that capturing 
groups are not supported.  so you need to replace each (...) with (?:...)

andrew

Original comment by acooke....@gmail.com on 2 Jan 2012 at 12:11

GoogleCodeExporter commented 9 years ago
Thanks for the tip.

The definition is as follows

    string = Token(r'"(?:[^"]|\\")+"') | Token(r"'(?:[^']|\\')+'")

But I wonder how above is different from SingleLineString?

If it is not different, then I would like to report that the following code 
fails

"""
from lepl import *

v = Token('[a-z]+') & Token(' +') & Token(SingleLineString())
v.parse('aaa "aaa"')
"""

w

Original comment by wrob...@gmail.com on 5 Jan 2012 at 6:53