Closed SerafimArts closed 5 years ago
Hi :-),
Thanks for the relevant questions.
$nodeId
and $defaultId
in grammar rules?When you write a rule, you can define a node ID like this:
rule:
call() ::token:: #nodeId
But most of the time, the node ID is the name of the rule:
foo:
call() ::token:: #foo
And so it can be reduced to:
#foo:
call() ::token::
In this case, #foo
is the default ID. Why default? Because it can be overwritten:
#foo:
::tokenA:: | ::tokenB:: #bar
When tokenA
is matched, the node ID is the default node ID (namely #foo
), but when tokenB
is matched, the node ID is #bar
.
In this case, #foo
is the default ID, and #bar
is the node ID for the second branch of the condition in the rule foo
.
$nodeId
in tokens?That's a good question. I read the code, and right now, I found no useful case where the $nodeId
is used in the constructor. It is always set to null
. I guess we can remove it!
I don't fully understand the question. The data structure is a flat representation of the grammar. New rules are created, for instance:
rule:
<a> ( <b> | <c> )
is transformed into (pseudo code):
rule: concat(0, 1)
0: token(a)
1: choice(2, 3)
2: token(b)
3: token(c)
There is only 4 operators:
Choice(...children)
,Concatenation(...children)
Repetition(min, max, ...child)
,Token
.Your suggestion is to rename indexes for the flat grammar representation to be only numerical, so that PHP 7 could optimise it? Is that right?
When tokenA is matched, the node ID is the default node ID (namely #foo), but when tokenB is matched, the node ID is #bar.
Hmm, I did not think about this situation, thank you!
That's a good question. I read the code, and right now, I found no useful case where the $nodeId is used in the constructor. It is always set to null. I guess we can remove it!
Should I create a separate issue for this task?
I don't fully understand the question. The data structure is a flat representation of the grammar. New rules are created, for instance:
I meant that in the end it is not very important whether the rule is a group or a separate rule:
example:
::token:: (::token:: rule())*
// ^ transitional repetition
// OR
example:
::token:: relation()*
// ^ non-transitional repetition
relation:
::token:: rule()
Therefore, information about the fact that the rule isTransitional
is also not needed.
Your suggestion is to rename indexes for the flat grammar representation to be only numerical, so that PHP 7 could optimise it? Is that right?
Rule names do not matter what they are, strings or numbers, then all these places can be made numeric, which should quite speed up this case and reduce memory consumption. Right.
However, there is one problem, trace items (Entry and Ekzit) are directed precisely by the names of the rules: $this->transtional = \is_int($name);
.
To this I do not think that the implementation:
1) Integer indexes + Integer names + add boolean $isTransitional
Will be faster and less in memory consumption, rather than:
2) String and Int indexes + String and Int names
Should I create a separate issue for this task?
Please yes! I'm cleaning up the hoa/compiler
so it's a good opportunity for that.
Therefore, information about the fact that the rule
isTransitional
is also not needed.
It is needed for tools that manipulate the grammar, so when analyzing language. This is not useful at runtime though.
Rule names do not matter what they are, strings or numbers, then all these places can be made numeric, which should quite speed up this case and reduce memory consumption. Right.
It feels true. The original name can be kept in an attribute (and I think it is already the case). So +1 for this!
Are you willing to try a PR for that too?
Please yes! I'm cleaning up the hoa/compiler so it's a good opportunity for that.
https://github.com/hoaproject/Compiler/issues/86
It is needed for tools that manipulate the grammar, so when analyzing language. This is not useful at runtime though.
Judging by the construction algorithm - it is still required to build a correct AST. I figured it out, so I'm taking off the question.
Are you willing to try a PR for that too?
I can not implement this part. Attempts to get rid of identifiers (replacing them with numeric indexes) resulted in my fork to multiple bugs. While I think about it =\
@SerafimArts do you have any remaining open questions, or has everything been answered now? Please close the ticket.
1) What is the difference between
$nodeId
and$defaultId
in grammar rules?2) Why do I need
$nodeId
in tokens?3) Why determine that the rule is transition, if the identifier can uniquely point to it?
Those. we can have completely numeric indexes in the rules array, and whether or not the rule in the AST will be determined by the presence of the name in the "Entry" element of the trace. That some order should speed up the initialization and fetching (optimizing arrays for numeric indexes php7+)