ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.
MIT License
5.01k stars 217 forks source link

Visiting without changing the input actually outputs a different value #409

Closed ericmorand closed 1 year ago

ericmorand commented 1 year ago

I'm trying to do the most basic thing with OHM: visiting without changing the input match result.

Here is the code that I use:

import {grammar, Node} from "ohm-js";

const myGrammar = grammar(`MyGrammar {
  Operations = nonemptyListOf<op, ",">

  op = "op(" digit ")"
}`);

const semantics = myGrammar.createSemantics();

semantics.addOperation<string>('passthrough', {
    _nonterminal(this: Node, ...nodes) {
        return nodes.map((node) => node.passthrough()).join('');
    },
    _terminal(this: Node) {
        return this.sourceString;
    },
    _iter(this: Node, ...nodes) {
        return nodes.map((node) => node.passthrough()).join('');
    }
})

const val = semantics(myGrammar.match('op(1),op(3),op(4)')).passthrough();

console.log(val); // op(1),,op(3)op(4)

I simply define a passthrough operation that output the source string of nodes untouched. But still, the output is different from the input:

input: op(1),op(3),op(4) output: op(1),,op(3)op(4)

I have no idea where these two consecutive , come from and why one of them is missing between the two last op.

Note that this is not due to the .join('') calls because removing them outputs an array that also is not a representation of the input:

[[["op(",["1"],")"],[",",","],[["op(",["3"],")"],["op(",["4"],")"]]]]

Is it a bug or am I missing something obvious here?

ericmorand commented 1 year ago

By pursuing my investigations, I noticed that the issue appears when there is more than 2 nodes in the iteration:

op(1),op(2) => op(1),op(2) op(1),op(2),op(3) => op(1),,op(2)op(3)

It looks like a bug more and more. :)

ericmorand commented 1 year ago

I've added some debug info to my code and rewrote the grammar to be the simplest possible:

import {grammar, Node} from "ohm-js";

const myGrammar = grammar(`MyGrammar {
  Operations = nonemptyListOf<digit, ",">
}`);

const semantics = myGrammar.createSemantics();

semantics.addOperation<any>('passthrough', {
    _nonterminal(this: Node, ...nodes) {
        console.log('NON TERMINAL', this.sourceString, this.ctorName);

        return this.children.map((node) => {
            console.log('  CHILD', node.sourceString, node.ctorName);

            return node.passthrough();
        });
    },
    _terminal(this: Node) {
        console.log('    TERMINAL', this.sourceString, this.ctorName);

        return this.sourceString;
    },
    _iter(this: Node, ...nodes) {
        return nodes.map((node) => node.passthrough());
    }
})

const val = semantics(myGrammar.match('1,2,3,4')).passthrough();

console.log(JSON.stringify(val));

Here is the formatted debug output:

NON TERMINAL 1,2,3,4 Operations
  CHILD 1,2,3,4 nonemptyListOf
    NON TERMINAL 1,2,3,4 nonemptyListOf
      CHILD 1 digit
        NON TERMINAL 1 digit
          CHILD 1 _terminal
            TERMINAL 1 _terminal
      CHILD ,2,3,4 _iter
        TERMINAL , _terminal
        TERMINAL , _terminal
        TERMINAL , _terminal
      CHILD ,2,3,4 _iter
        NON TERMINAL 2 digit
         CHILD 2 _terminal
            TERMINAL 2 _terminal
        NON TERMINAL 3 digit
          CHILD 3 _terminal
            TERMINAL 3 _terminal
        NON TERMINAL 4 digit
          CHILD 4 _terminal
            TERMINAL 4 _terminal

What we see here is that the nonemptyListOf node has 3 children:

I'm puzzled. Why is the nonemptyListOf having three children (1, ,,,, 234) instead of seven (1, ,, 2, ,, 3, ,, 4)?

Is it expected? If so, how can we reconstruct the input by visiting the nodes?

pdubroy commented 1 year ago

That is expected, yes. We should definitely improve the documentation about this.

It's due to the way repetition operators (e.g. *, +) are dealt with in semantic actions. If you have a rule like line = one ("," two)+, its semantic action takes three arguments:

line(one, commas, twos) {
    ...
}

Generally this makes writing semantic actions easier (we think) but it's a bit unintuitive to understand at first.

It's difficult to write an operation that will reconstruct the input using only the special actions (_terminal, _nonterminal, _iter). Probably the easiest thing to do would be to add a nonemptyListOf action to your operation.

ericmorand commented 1 year ago

It is not very elegant, there may be a better way, but it works:

semantics.addOperation<any>('passthrough', {
    nonemptyListOf(first, separators, rest) {
        return [
            first.passthrough(),
            rest.children.map((node, index) => {
                const separatorNode = separators.children[index];

                return [
                    separatorNode.passthrough(),
                    node.passthrough()
                ].join('');
            }).join('')
        ].join('');
    },
    _nonterminal(this: Node, ...nodes) {
        return this.children.map((node) => {
            return node.passthrough();
        }).join('');
    },
    _terminal(this: Node) {
        return this.sourceString;
    }
})

1,2,3,4

Thanks a lot for your help. :)