Skipping whitespace tokens

no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.

BSD 3-Clause "New" or "Revised" License

831 stars 66 forks source link

Skipping whitespace tokens #156

Open jarble opened 3 years ago

jarble commented 3 years ago

Is it possible to skip tokens when defining a lexer? I want to split a string into a list of tokens without whitespace, but I don't know if Moo can do this:

Input string:

"while ( a < 3 ) { a += 1; }"

List of tokens:

["while","(","a","<","3",")","{","a","+=","1",",";","}"]

nathan commented 3 years ago

const moo = require('moo')
const lex = moo.compile({
  ws: {match: /\p{White_Space}+/u, lineBreaks: true},
  word: /\p{XID_Start}\p{XID_Continue}*/u,
  op: moo.fallback,
})
;[...lex.reset('while ( a < 3 ) { a += 1; }')]
.filter(t => t.type !== 'ws')
.map(t => t.value)

jarble commented 3 years ago

@nathan The documentation doesn't describe this feature: does it need to be updated?

tjvr commented 3 years ago

The documentation needs to be updated to document moo.fallback (see #112).

As for the rest, I think Nathan's just demonstrating that since a moo lexer object is an Iterator, you can use filter() and map() which are built-in to JavaScript.