lark-parser / Lark.js

Live port of Lark's standalone parser to Javascript
MIT License
71 stars 12 forks source link

__copy__ implementation missing #46

Open thekevinscott opened 6 months ago

thekevinscott commented 6 months ago

Hi, me again.

It looks like the dunder __copy__ methods are not carried over to Javascript:

Presumably, other dunder methods like __eq__ are not available, though I haven't checked.

In practice, this results in different outcomes when running accepts. Specifically, parser_state.state_stack grows with the previous token IDs instead of maintaining independent copies.

A fix for my specific issue here looks like this:

        const copiedParserState = new ParserState(
          this.parser_state.parse_conf,
          this.parser_state.lexer,
          [...this.parser_state.state_stack],
          [...this.parser_state.value_stack],
        )
        new_cursor = new InteractiveParser(this.parser, copiedParserState, copy(this.lexer_thread));

Which successfully maintains independent state_stack copies.

However, this wouldn't solve other outstanding references to dunder methods.

thekevinscott commented 6 months ago

I thought of a way to achieve this.

One, modify the copy method like so:

function copy(obj) {
  if (typeof obj == "object") {
    if ('__copy__' in obj.constructor) {
      return obj.constructor.__copy__(obj);
    }
    let empty_clone = Object.create(Object.getPrototypeOf(obj));
    return Object.assign(empty_clone, obj);
  }
  return obj;
}

Then, for ParserState and InteractiveParser (and I guess anyone else implementing a dunder method):

class ParserState {
  static __copy__(parser_state) {
    return new ParserState(
      parser_state.parse_conf,
      parser_state.lexer,
      [...parser_state.state_stack],
      [...parser_state.value_stack],
    );
  }
...
}

class InteractiveParser {
  static __copy__(parser) {
    return new InteractiveParser(parser, copy(parser.parser_state), copy(this.lexer_thread));
  }
...
}
erezsh commented 6 months ago

Why make __copy__ static, and not a regular method?

thekevinscott commented 6 months ago

That's a fair question.

I was trying to align with the original Python implementations, which seem to have a copy instance method (which simply wraps the call to the native Python copy) and the dunder __copy__ method (which implements the deeper copy logic). The copy method (I'm assuming) is exposed publicly on the instance, whereas the __copy__ method I'd assume should remain hidden as much as possible.

Just a matter of preference, though. The two methods could be combined, or __copy__ could be a non-static instance method, and the end result should be identical.

erezsh commented 6 months ago

I think it's a good idea to stick to the Python structure whenever it makes sense. But in Python __copy__ is also a method, and not static.

thekevinscott commented 6 months ago

Roger that - so something like:

function copy(obj) {
  if (typeof obj == "object") {
    if ('__copy__' in obj) {
      return obj.__copy__();
    }
    let empty_clone = Object.create(Object.getPrototypeOf(obj));
    return Object.assign(empty_clone, obj);
  }
  return obj;
}

class ParserState {
  __copy__() {
    return new ParserState(
      this.parse_conf,
      this.lexer,
      [...this.state_stack],
      [...this.value_stack],
    );
  }
...
}

class InteractiveParser {
  __copy__() {
    return new InteractiveParser(this, copy(this.parser_state), copy(this.lexer_thread));
  }
...
}
erezsh commented 6 months ago

Yes, that looks better!