Closed servoz closed 5 years ago
Actually, Earley never supported weights on terminals, it just didn't throw an error. It's possible that just removing the weights will solve your issue without affecting the parse result.
I'm gonna sound stupid, but what's weights on terminals ?
It basically just changes the priority on which terminal to choose in case of collision. That's only relevant for the standard and contextual lexers, and not the default Earley lexer which tries all combinations.
You can still have priority in rules, so having them on terminals isn't necessary and complicates the implementation (which is already non-trivial).
To make sure I understood correctly. In this part of grammar: ... quoted_field_name.1 : QUOTED_FIELD_NAME QUOTED_FIELD_NAME.1 : "{" /[^}]/ "}" ... To remove the priority, I change to: ... quoted_field_name : QUOTED_FIELD_NAME QUOTED_FIELD_NAME : "{" /[^}]/ "}" ...
In fact, I remove all .n from TERM.n, everywhere in the grammar. Is that correct?
I would keep quoted_field_name.1
, as it might be meaningful, and only change to QUOTED_FIELD_NAME
Ok, I start to be less confused, thanks to you !
May be a last question, please:
I must remove all n=2, because I see in the lark/parser_frontends.py module:
if t.priority != 1:
raise ValueError("Dynamic Earley doesn't support weights on terminals", t, t.priority)
So, if I am not wrong, I think I can keep (or remove) all n = 1 but I MUSTremove all n=2 (with Dynamic Earley) ?. For example in the following part of a grammar:
...
?literal.2 : KEYWORD_LITERAL -> keyword_literal
| ESCAPED_STRING -> string
| SIGNED_NUMBER -> number
| DATE -> date
| TIME -> time
| DATETIME -> datetime
| list
KEYWORD_LITERAL.2 : "NULL"i
| "TRUE"i
| "FALSE"i
...
Must become:
...
?literal : KEYWORD_LITERAL -> keyword_literal
| ESCAPED_STRING -> string
| SIGNED_NUMBER -> number
| DATE -> date
| TIME -> time
| DATETIME -> datetime
| list
KEYWORD_LITERAL: "NULL"i
| "TRUE"i
| "FALSE"i
...
I am right ?
Yes, it's a quirk really, but that should solve it. Again, literal.2
is fine.
Great! Thank you very much for your support! I will report here the result (old grammar, new grammar), in case it may help someone else like me who is a little confused with these notions! Right now I'm so busy, but ASAP ....
Great. Let me know if you need more help
I removed the weights on the terminals and, as expected, there is no longer any exception raised. However ... the execution time is now very long! Execution times for exactly the same operation:
filter_grammar = '''
?start : filter
?filter : "ALL"i -> all
| conditions
| negation
| "(" filter ")"
| filter BOOLEAN_OPERATOR filter -> conditions
?conditions : condition (BOOLEAN_OPERATOR condition)*
negation : "NOT"i condition
| "NOT"i "(" filter ")"
BOOLEAN_OPERATOR : "AND"i
| "OR"i
CONDITION_OPERATOR : "=="i
| "!="i
| "<="i
| ">="i
| ">"i
| "<"i
| "IN"i
| "ILIKE"i
| "LIKE"i
condition : operand CONDITION_OPERATOR operand
?operand : literal
| field_name
field_name.1 : FIELD_NAME
| quoted_field_name
quoted_field_name.1 : QUOTED_FIELD_NAME
?literal.2 : KEYWORD_LITERAL -> keyword_literal
| ESCAPED_STRING -> string
| SIGNED_NUMBER -> number
| DATE -> date
| TIME -> time
| DATETIME -> datetime
| list
DATE : INT "-" INT "-" INT
TIME : INT ":" INT (":" INT ("." INT)?)?
DATETIME : DATE "T" TIME
KEYWORD_LITERAL.2 : "NULL"i
| "TRUE"i
| "FALSE"i
FIELD_NAME.1 : ("_"|LETTER) ("_"|LETTER|DIGIT)*
QUOTED_FIELD_NAME.1 : "{" /[^}]/* "}"
list : "[" [literal ("," literal)*] "]"
%import common.INT
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%import common.LETTER
%import common.DIGIT
%import common.WS
%ignore WS
'''
filter_grammar = '''
?start : filter
?filter : "ALL"i -> all
| conditions
| negation
| "(" filter ")"
| filter BOOLEAN_OPERATOR filter -> conditions
?conditions : condition (BOOLEAN_OPERATOR condition)*
negation : "NOT"i condition
| "NOT"i "(" filter ")"
BOOLEAN_OPERATOR : "AND"i
| "OR"i
CONDITION_OPERATOR : "=="i
| "!="i
| "<="i
| ">="i
| ">"i
| "<"i
| "IN"i
| "ILIKE"i
| "LIKE"i
condition : operand CONDITION_OPERATOR operand
?operand : literal
| field_name
field_name.1 : FIELD_NAME
| quoted_field_name
quoted_field_name.1 : QUOTED_FIELD_NAME
?literal.2 : KEYWORD_LITERAL -> keyword_literal
| ESCAPED_STRING -> string
| SIGNED_NUMBER -> number
| DATE -> date
| TIME -> time
| DATETIME -> datetime
| list
DATE : INT "-" INT "-" INT
TIME : INT ":" INT (":" INT ("." INT)?)?
DATETIME : DATE "T" TIME
KEYWORD_LITERAL : "NULL"i
| "TRUE"i
| "FALSE"i
FIELD_NAME : ("_"|LETTER) ("_"|LETTER|DIGIT)*
QUOTED_FIELD_NAME : "{" /[^}]/* "}"
list : "[" [literal ("," literal)*] "]"
%import common.INT
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%import common.LETTER
%import common.DIGIT
%import common.WS
%ignore WS
'''
37 times longer...
That is very strange! The new version is supposed to be much faster...
If you could get your grammar working with LALR, speed isn't going to be a concern.
If you can get it into a smaller grammar (a minimal example) that is still much slower with the new version, I could try and look into it.
First tests with the LALR parser and the our current grammar seems to give good results. So, for now, we will stay with LALR ! Thanks again for your help !
We use lark in the populse_db project. Since version 0.7.0, our project has been crashing with the exception ValueError "Dynamic Earley does not support weights on terminals". I confess that I am not an expert in this field and I would like to know what I can change in the use of lark to avoid this exception. We currently remain below 0.7.0 until we can bypass this exception ...