Open ArsenShnurkov opened 7 years ago
The Early algorithm will probably parse that grammar just fine without using presedence parsing rules. If you look at the parse forest that comes out of the GetParseForestRoot method call, it contains And nodes that represent alternative parses.
Try to use the grammar as is, but treat - as concatinations and see if the parse forest contains your parse.
You will need to look at the parse forest, if you try to turn the parse forest directly into a parse tree, the disambiguation algorithm will only select the first alternative at each forest And node.
concatenation will work like positive lookahead. negative lookahead for regex can be represented as positive lookahead. Positive lookahead for regex can be implemented with NFA/DFA. CFG + DFA = CFG. But both lookaheads are different from the operation I am asking. After some 2 days discussion in #marpa i now think that CYK algorithm is required for this instead of Earley.
Ok, read the forum and have a better idea of what you are trying to do.
I'll have to think about this a bit and see how eto.parse implements it. Also need to read up on cyk.
https://en.wikipedia.org/wiki/Boolean_grammar https://en.wikipedia.org/wiki/Conjunctive_grammar 2001-04, Okhotin, Conjunctive grammars http://dl.acm.org/citation.cfm?id=543323
Boolean grammars extend the context-free grammars, with conjunction and negation operations. http://users.utu.fi/aleokh/papers/conj_bool_programming.pdf http://users.utu.fi/aleokh/papers/boolean_survey.pdf He has also implementation of parser in C++ - http://users.utu.fi/aleokh/whalecalf/
https://stackoverflow.com/questions/38544032/negation-in-thompsons-algorithm
https://en.wikipedia.org/wiki/Brzozowski_derivative http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.4378&rep=rep1&type=pdf
https://cs164fa09.pbworks.com/f/midterm1_solutions.pdf
a generalized regular expression denotes a possibly infinite set of finite-length strings of symbols from A. It may be built of:
∅ (denoting the empty set of strings), ε (denoting the singleton set containing just the empty string), a symbol a from A (denoting the singleton set containing the single-symbol string a), R∨S (where R and S are, in turn, generalized regular expressions; denoting their set's union), R∧S (denoting the intersection of R 's and S 's set), ¬R (denoting the complement of R 's set with respect to the set of all strings of symbols from A), RS (denoting the set of all possible concatenations of strings from R 's and S 's set), R* (denoting the set of n-fold repetitions of strings from R 's set, for any n≥0, including the empty > > string).
In an ordinary regular expression, neither ∧ nor ¬ is allowed.
So, I want 2 things: 1) support of '-' operation "R - S" in regular expressions part (as "R ∧ ¬S"); 2) support of except-symbol as in part 4 of ISO EBNF syntax standard in context-free part:
2 The normal character representing each operator of Extended BNF and its implied precedence is (highest precedence at the top): * repetition-symbol - except-symbol , concatenate-symbol | definition-separator-symbol = defining-symbol ; terminator-symbol
2017, Mirzakhmet Syzdykov, Deterministic automata for extended regular expressions
algorithms to produce deterministic finite automaton (DFA) for extended operators in regular expressions like intersection, subtraction and complement
two libraries for combining automatas: https://github.com/moodmosaic/Fare https://github.com/AutomataDotNet/Automata
Dear Arsen Shnurkov
Repository: https://github.com/mirzakhmets/regexplus
CodeProject article: https://www.codeproject.com/Tips/5280511/Regexplus-Extended-Regular-Expression-Engine
Your subtraction operator is present, along with intersection and complement
Regards, Mirzakhmet Syzdykov
Here is the grammar which I want to use:
I don't know how to rewrite this grammar for Pliant (with cosmetic-only changes), because "-" operation in Eto.Parse is not the same as in ISO EBNF. In Eto.Parse the "-" operation have some different semantic. Rule matches left operand greedely, but resulting match should not contain the right operand.
It is also not the same as negative or positive lookahead of Perl regexp's.
I propose to define and describe the own syntax of Pliant and add there an "^" operation which will have this semantic (this way it will be possible to use "-" as in ISO EBNF).