no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
830 stars 66 forks source link

Use case for rewind #12

Closed nathan closed 7 years ago

nathan commented 7 years ago

Why do rewind / rewindLines exist? They have no effect on the token stream other than truncating it, and it seems like the following is semantically identical to the current implementation:

Lexer.prototype.rewind = function(index) {
  if (index > this.buffer.length) {
    throw new Error("Can't seek forwards")
  }
  this.re.lastIndex = 0
  this.buffer = ''
  return this
}

That passes all the tests other than checks against lexer.buffer, which seems like it should not be part of the public API anyway.

tjvr commented 7 years ago

Interesting point! I guess that is currently true.

The idea is so moo can be used as part of a syntax highlighting framework (e.g. a custom CodeMirror mode). You can feed() a line at a time; when the input changes, CodeMirror restarts syntax highlighting from the first changed line.

I suppose if we had a way to save / get a handle to the current internal state (line number, state stack), rewind() wouldn't be needed? I'm not sure.

tjvr commented 7 years ago

Perhaps we could allow passing a previous Token to reset()? The lexer could copy the line number off of the Token. This shifts the responsibility for keeping track of past tokens / line numbers onto the user, and avoids the need for rewind().

It would, however, mean keeping a reference to the state stack on the Token.

tjvr commented 7 years ago

From #14:

a better API would be explicit saveState() and reset([input][, state]); the return value of saveState() would keep track of the state stack, line/column number, etc. That way the operations are reset(), which (re)sets all state and abandons the current stream, and feed(), which adds data to the end of the stream, and the token stream only includes information about the tokens themselves.