on EOF, can I get a list of tokens that would be matched given more input?

no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.

BSD 3-Clause "New" or "Revised" License

824 stars 66 forks source link

on EOF, can I get a list of tokens that would be matched given more input? #101

Closed akonsu closed 6 years ago

akonsu commented 6 years ago

When the lexer encounters EOF, and ends in a valid (finite state machine) state, is it possible to get a list of tokens that are "incomplete", meaning that the lexer just matched a prefix and stopped when it found a EOF?

I mean, say, I define these tokens:

"abc" "abd" "cde"

and the input is "ab". When the lexer sees EOF, I need to get a list with two tokens "abc" and "abd". In the case where input has invalid characters the list would be empty. For example, for input "abe", there are no tokens that are incomplete. I need this for content assist in my editor.

nathan commented 6 years ago

The list isn't necessarily finite (e.g., /ab.+/), and this would require parsing regexes. You're probably best off matching /\w+/ (or some other appropriate catch-all) at the end of your list of tokens and generating the completion list on your own.

akonsu commented 6 years ago

@nathan I do not understand what you are saying about the list not being finite... The number of tokens in a lexer spec is finite, right? I am looking for a way to list tokens, by name (by type, in moo's terms), that are incomplete at the point where EOF has been seen. I am not looking for a list of all possible continuations of the given input.

tjvr commented 6 years ago

As Nathan says, I'm afraid this isn't something that we can do in Moo. Moo is based on JavaScript RegExps, and they don't allow us to test for partial matches.

As Nathan says, you're better off matching identifiers in general, and then generating completions yourself. 🙂

_{Sent with GitHawk}

akonsu commented 6 years ago

@tjvr I understand. Thanks for your response. This is not for completions, BTW. It is for syntax highlighting. For example, I want to highlight strings, as the user types them, and even if they are still incomplete. Or comments, etc.

nathan commented 6 years ago

@tjvr I understand. Thanks for your response. This is not for completions, BTW. It is for syntax highlighting. For example, I want to highlight strings, as the user types them, and even if they are still incomplete. Or comments, etc.

You may want to write your lexer such that every token type matches its prefixes too (e.g., strings match unmatched "… to EOL/EOF). Then you can just use the token stream as-is, which is much easier than trying to re-lex the end token after you recover it.