Closed ericmorand closed 1 year ago
By pursuing my investigations, I noticed that the issue appears when there is more than 2 nodes in the iteration:
op(1),op(2)
=> op(1),op(2)
op(1),op(2),op(3)
=> op(1),,op(2)op(3)
It looks like a bug more and more. :)
I've added some debug info to my code and rewrote the grammar to be the simplest possible:
import {grammar, Node} from "ohm-js";
const myGrammar = grammar(`MyGrammar {
Operations = nonemptyListOf<digit, ",">
}`);
const semantics = myGrammar.createSemantics();
semantics.addOperation<any>('passthrough', {
_nonterminal(this: Node, ...nodes) {
console.log('NON TERMINAL', this.sourceString, this.ctorName);
return this.children.map((node) => {
console.log(' CHILD', node.sourceString, node.ctorName);
return node.passthrough();
});
},
_terminal(this: Node) {
console.log(' TERMINAL', this.sourceString, this.ctorName);
return this.sourceString;
},
_iter(this: Node, ...nodes) {
return nodes.map((node) => node.passthrough());
}
})
const val = semantics(myGrammar.match('1,2,3,4')).passthrough();
console.log(JSON.stringify(val));
Here is the formatted debug output:
NON TERMINAL 1,2,3,4 Operations
CHILD 1,2,3,4 nonemptyListOf
NON TERMINAL 1,2,3,4 nonemptyListOf
CHILD 1 digit
NON TERMINAL 1 digit
CHILD 1 _terminal
TERMINAL 1 _terminal
CHILD ,2,3,4 _iter
TERMINAL , _terminal
TERMINAL , _terminal
TERMINAL , _terminal
CHILD ,2,3,4 _iter
NON TERMINAL 2 digit
CHILD 2 _terminal
TERMINAL 2 _terminal
NON TERMINAL 3 digit
CHILD 3 _terminal
TERMINAL 3 _terminal
NON TERMINAL 4 digit
CHILD 4 _terminal
TERMINAL 4 _terminal
What we see here is that the nonemptyListOf
node has 3 children:
1
digit_iter
that contains all the comas terminal nodes_iter
that contains all the other digitsI'm puzzled. Why is the nonemptyListOf
having three children (1
, ,,,
, 234
) instead of seven (1
, ,
, 2
, ,
, 3
, ,
, 4
)?
Is it expected? If so, how can we reconstruct the input by visiting the nodes?
That is expected, yes. We should definitely improve the documentation about this.
It's due to the way repetition operators (e.g. *
, +
) are dealt with in semantic actions. If you have a rule like line = one ("," two)+
, its semantic action takes three arguments:
line(one, commas, twos) {
...
}
Generally this makes writing semantic actions easier (we think) but it's a bit unintuitive to understand at first.
It's difficult to write an operation that will reconstruct the input using only the special actions (_terminal
, _nonterminal
, _iter
). Probably the easiest thing to do would be to add a nonemptyListOf
action to your operation.
It is not very elegant, there may be a better way, but it works:
semantics.addOperation<any>('passthrough', {
nonemptyListOf(first, separators, rest) {
return [
first.passthrough(),
rest.children.map((node, index) => {
const separatorNode = separators.children[index];
return [
separatorNode.passthrough(),
node.passthrough()
].join('');
}).join('')
].join('');
},
_nonterminal(this: Node, ...nodes) {
return this.children.map((node) => {
return node.passthrough();
}).join('');
},
_terminal(this: Node) {
return this.sourceString;
}
})
1,2,3,4
Thanks a lot for your help. :)
I'm trying to do the most basic thing with OHM: visiting without changing the input match result.
Here is the code that I use:
I simply define a
passthrough
operation that output the source string of nodes untouched. But still, the output is different from the input:input:
op(1),op(3),op(4)
output:op(1),,op(3)op(4)
I have no idea where these two consecutive
,
come from and why one of them is missing between the two lastop
.Note that this is not due to the
.join('')
calls because removing them outputs an array that also is not a representation of the input:Is it a bug or am I missing something obvious here?