loloicci / nimly

Lexer Generator and Parser Generator as a Library in Nim.
MIT License
147 stars 4 forks source link

[Suggestion] Add EOF to lexer #70

Closed wyattjsmith1 closed 3 years ago

wyattjsmith1 commented 3 years ago

It is useful to be able to match the end of the string when lexing. Consider the following grammar in ebnf:

text_line ::= [a-zA-Z]* line_end
line_end ::= "\u000D\u000A" | "\u000A" | EOF

The grammar above will take any alphabetic characters until either a newline or the end of the file. As a result, a blank newline at the end of a line is optional. This grammar can not accurately be represented in nimly because of the EOF. nimly can be expanded by adding the $ symbol to mean the end of input, similar to regex:

niml fluentLexer[MyToken]:
  "[a..zA..Z]":
    MyAlphaToken(token.token)
  "[\u000D\u000A|\u000A|$]":
    MyLineEndToken()
loloicci commented 3 years ago

Thank you for your suggestion, @wyattjsmith1.

It sounds like a good idea that lexer produces a token which means EOF. But, it is not good that lexers recognize EOF as the same as other characters.

I suggest you wrapping lexIter (https://github.com/loloicci/nimly/blob/fa3a01ed8d51d0381a447a20c84459b664a13d1c/src/nimly/lexer.nim#L116) as it produces the token for EOF (in this example, MyLineEndToken()) after the original lexIter stops iteration.

Does it solve your problem?

wyattjsmith1 commented 3 years ago

Ah, ok. That should work. Thanks for the advice!