Some questions about the structure of rules

SerafimArts commented 6 years ago

1) What is the difference between $nodeId and $defaultId in grammar rules?

(new Concatenation($id, $children, $nodeId))->setDefaultId($defaultId);
//                                 ^^^^^^^ - There ------- ^^^^^^^^^^

2) Why do I need $nodeId in tokens?

new Token($id, $name, $nodeId, $unificationId, $kept);
//                    ^^^^^^^ - excess?

3) Why determine that the rule is transition, if the identifier can uniquely point to it?

Those. we can have completely numeric indexes in the rules array, and whether or not the rule in the AST will be determined by the presence of the name in the "Entry" element of the trace. That some order should speed up the initialization and fetching (optimizing arrays for numeric indexes php7+)

Hywan commented 6 years ago

Hi :-),

Thanks for the relevant questions.

1. What is the difference between `$nodeId` and `$defaultId` in grammar rules?

When you write a rule, you can define a node ID like this:

rule:
    call() ::token:: #nodeId

But most of the time, the node ID is the name of the rule:

foo:
    call() ::token:: #foo

And so it can be reduced to:

#foo:
    call() ::token::

In this case, #foo is the default ID. Why default? Because it can be overwritten:

#foo:
    ::tokenA:: | ::tokenB:: #bar

When tokenA is matched, the node ID is the default node ID (namely #foo), but when tokenB is matched, the node ID is #bar.

In this case, #foo is the default ID, and #bar is the node ID for the second branch of the condition in the rule foo.

2. Why do I need `$nodeId` in tokens?

That's a good question. I read the code, and right now, I found no useful case where the $nodeId is used in the constructor. It is always set to null. I guess we can remove it!

3. Why determine that the rule is transition, if the identifier can uniquely point to it?

I don't fully understand the question. The data structure is a flat representation of the grammar. New rules are created, for instance:

rule:
    <a> ( <b> | <c> )

is transformed into (pseudo code):

rule: concat(0, 1)
0: token(a)
1: choice(2, 3)
2: token(b)
3: token(c)

There is only 4 operators:

Choice(...children),
Concatenation(...children)
Repetition(min, max, ...child),
Token.

Your suggestion is to rename indexes for the flat grammar representation to be only numerical, so that PHP 7 could optimise it? Is that right?

SerafimArts commented 6 years ago

1.

When tokenA is matched, the node ID is the default node ID (namely #foo), but when tokenB is matched, the node ID is #bar.

Hmm, I did not think about this situation, thank you!

2.

That's a good question. I read the code, and right now, I found no useful case where the $nodeId is used in the constructor. It is always set to null. I guess we can remove it!

Should I create a separate issue for this task?

3.

I don't fully understand the question. The data structure is a flat representation of the grammar. New rules are created, for instance:

I meant that in the end it is not very important whether the rule is a group or a separate rule:

example:
    ::token:: (::token:: rule())*
//                ^ transitional repetition

// OR

example:
    ::token:: relation()*
//                ^ non-transitional repetition

relation:
  ::token:: rule()

Therefore, information about the fact that the rule isTransitional is also not needed.

Your suggestion is to rename indexes for the flat grammar representation to be only numerical, so that PHP 7 could optimise it? Is that right?

Rule names do not matter what they are, strings or numbers, then all these places can be made numeric, which should quite speed up this case and reduce memory consumption. Right.

However, there is one problem, trace items (Entry and Ekzit) are directed precisely by the names of the rules: $this->transtional = \is_int($name);.

To this I do not think that the implementation:

1) Integer indexes + Integer names + add boolean $isTransitional

Will be faster and less in memory consumption, rather than:

2) String and Int indexes + String and Int names

Hywan commented 6 years ago

Should I create a separate issue for this task?

Please yes! I'm cleaning up the hoa/compiler so it's a good opportunity for that.

Therefore, information about the fact that the rule isTransitional is also not needed.

It is needed for tools that manipulate the grammar, so when analyzing language. This is not useful at runtime though.

Rule names do not matter what they are, strings or numbers, then all these places can be made numeric, which should quite speed up this case and reduce memory consumption. Right.

It feels true. The original name can be kept in an attribute (and I think it is already the case). So +1 for this!

Are you willing to try a PR for that too?

SerafimArts commented 6 years ago

Please yes! I'm cleaning up the hoa/compiler so it's a good opportunity for that.

https://github.com/hoaproject/Compiler/issues/86

It is needed for tools that manipulate the grammar, so when analyzing language. This is not useful at runtime though.

Judging by the construction algorithm - it is still required to build a correct AST. I figured it out, so I'm taking off the question.

Are you willing to try a PR for that too?

I can not implement this part. Attempts to get rid of identifiers (replacing them with numeric indexes) resulted in my fork to multiple bugs. While I think about it =\

flip111 commented 5 years ago

@SerafimArts do you have any remaining open questions, or has everything been answered now? Please close the ticket.

hoaproject / Compiler