Closed kestereverts closed 7 years ago
const itt = require('itt')
const moo = require('moo')
const lexer = moo.compile({
id: /\w+/,
sp: {match: /\s+/, lineBreaks: true},
})
lexer.reset('foo bar')
const tokens = itt.push({ type: 'eof', value: '<eof>' }, lexer)
for (const tok of tokens) {
console.log(tok)
}
// { type: 'id', … }
// { type: 'sp', … }
// { type: 'id', … }
// { type: 'eof', … }
If you need line/col information it's still pretty easy to roll your own:
lexer.reset('foo bar')
const tokens = withEof(lexer, { type: 'eof', value: '<eof>' })
for (const tok of tokens) {
console.log(tok)
}
function* withEof(lexer, eof) {
yield* lexer
yield Object.assign(eof, {
toString() { return this.value },
offset: lexer.index,
size: 0,
lineBreaks: 0,
line: lexer.line,
col: lexer.col,
})
}
Thanks for the suggestion! I think as @nathan points out this is pretty easy to add yourself in a stage on top of Moo, so we don't want to include it in Moo core. Sorry! :-)
Thanks, @tjvr and @nathan. Appending EOF
to the token stream is possible solution, but you would no longer have access to moo's API, which is an important aspect of this proposal.
nearley can use moo as lexer with just one statement in its grammar:
@lexer your_moo_instance
This instance has to comply with nearley's Custom lexers interface, which moo does. You would have to replicate moo's API with the suggested solution above. This is why I thought emitting EOF
s is an elegant solution. Thank you for considering, though!
@tjvr this one might be useful/trivial enough to include for nearley, especially since there's no runtime cost when tokenizing (just a check inside the EOF if
).
just a check inside the EOF
if
You're right, this would be cheap (although you'd also need to keep track of whether you'd already emitted the EOF token).
But I don't see how this benefits Nearley specifically. Using EOF tokens inside a CFG is fairly unusual IMHO; it's usually not what you want (unlike in a PEG where it might make more sense).
Using EOF tokens inside a CFG is fairly unusual IMHO
I've never used nearley and assumed it was reasonable, but it might not be. What does nearley do with next
returning undefined
/ a parse that doesn't consume all of the input?
In nearly, next() returning undefined
indicates EOF, which is really just the end of the chunk passed to feed()
. When you then call finish()
, you'll get zero results.
Ah, then this would probably be superfluous.
I'd like to propose emitting an
EOF
token when the end of the input is reached. This is useful when using moo with nearley (or any other parser generator that does not natively supportEOF
tokens). This way, the parser can generate an error when theEOF
token is unexpected. This will also add information about where theEOF
happened, providing a more useful error message.It could be implemented with the following API:
EOF
would be emitted once and only once when the end of the input is reached. Thereafter, callingnext
will returnundefined
as before.EOF
will not be emitted when there is nomoo.eof
present. It is invalid to have multiplemoo.eof
's per group.